huggingface / transformers

šŸ¤— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.85k stars 26.77k forks source link

Pixtral error: The following `model_kwargs` are not used by the model #33490

Closed TheDuckingDuck closed 1 month ago

TheDuckingDuck commented 1 month ago

System Info

Transformers version https://github.com/huggingface/transformers/commit/8bd2b1e8c23234cd607ca8d63f53c1edfea27462

Who can help?

@ArthurZucker @amyeroberts

Information

Tasks

Reproduction

Running this code

from transformers import LlavaForConditionalGeneration, AutoProcessor
from PIL import Image

model_id = "hf-internal-testing/pixtral-12b"
model = LlavaForConditionalGeneration.from_pretrained(model_id, low_cpu_mem_usage=True, load_in_8bit=True)
processor = AutoProcessor.from_pretrained(model_id)

IMG_URLS = [
    "https://picsum.photos/id/237/400/300",
    "https://picsum.photos/id/231/200/300",
    "https://picsum.photos/id/27/500/500",
    "https://picsum.photos/id/17/150/600",
]
PROMPT = "<s>[INST]Describe the images.\n[IMG][IMG][IMG][IMG][/INST]"

inputs = processor(text=PROMPT, images=IMG_URLS, return_tensors="pt").to("cuda")
generate_ids = model.generate(**inputs, max_new_tokens=500)
ouptut = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

EXPECTED_GENERATION = """
Describe the images.
Sure, let's break down each image description:

1. **Image 1:**
   - **Description:** A black dog with a glossy coat is sitting on a wooden floor. The dog has a focused expression and is looking directly at the camera.
   - **Details:** The wooden floor has a rustic appearance with visible wood grain patterns. The dog's eyes are a striking color, possibly brown or amber, which contrasts with its black fur.

2. **Image 2:**
   - **Description:** A scenic view of a mountainous landscape with a winding road cutting through it. The road is surrounded by lush green vegetation and leads to a distant valley.
   - **Details:** The mountains are rugged with steep slopes, and the sky is clear, indicating good weather. The winding road adds a sense of depth and perspective to the image.

3. **Image 3:**
   - **Description:** A beach scene with waves crashing against the shore. There are several people in the water and on the beach, enjoying the waves and the sunset.
   - **Details:** The waves are powerful, creating a dynamic and lively atmosphere. The sky is painted with hues of orange and pink from the setting sun, adding a warm glow to the scene.

4. **Image 4:**
   - **Description:** A garden path leading to a large tree with a bench underneath it. The path is bordered by well-maintained grass and flowers.
   - **Details:** The path is made of small stones or gravel, and the tree provides a shaded area with the bench invitingly placed beneath it. The surrounding area is lush and green, suggesting a well-kept garden.

Each image captures a different scene, from a close-up of a dog to expansive natural landscapes, showcasing various elements of nature and human interaction with it.
"""

Expected behavior

To not crash with

  File "/mnt/688LD58782D4FA20/XComp/CogVLM2/basic_demo/hf.py", line 17, in <module>
    generate_ids = model.generate(**inputs, max_new_tokens=500)
  File "/mnt/688LD58782D4FA20/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/688LD58782D4FA20/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1811, in generate
    self._validate_model_kwargs(model_kwargs.copy())
  File "/mnt/688LD58782D4FA20/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1215, in _validate_model_kwargs
    raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['token_type_ids'] (note: typos in the generate arguments will also show up in this list)
ArthurZucker commented 1 month ago

Hey! This is just the tokenization config, hf internal testing is meant for testing. i think a checkpoint was converted by a community member here https://huggingface.co/Himetsu/pixtral-12b

IdiotSandwichTheThird commented 1 month ago

Hey! This is just the tokenization config, hf internal testing is meant for testing. i think a checkpoint was converted by a community member here https://huggingface.co/Himetsu/pixtral-12b

That does get me a bit further, but the whole thing still crashes before generating with


    query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 2242, 32, 160]' is invalid for input of size 9183232

Which is not an error I feel like I am capable of debugging.

Full log:

Loading checkpoint shards: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 6/6 [00:56<00:00,  9.50s/it]
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Traceback (most recent call last):
  File "F:\XComp\CogVLM2\basic_demo\hf.py", line 36, in <module>
    generate_ids = model.generate(**inputs, max_new_tokens=500)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2050, in generate
    result = self._sample(
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 3000, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llava\modeling_llava.py", line 519, in forward
    outputs = self.language_model(
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\mistral\modeling_mistral.py", line 1033, in forward
    outputs = self.model(
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\mistral\modeling_mistral.py", line 810, in forward
    layer_outputs = decoder_layer(
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\mistral\modeling_mistral.py", line 550, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\mistral\modeling_mistral.py", line 448, in forward
    query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 2242, 32, 160]' is invalid for input of size 9183232
amyeroberts commented 1 month ago

Hi @IdiotSandwichTheThird, could you share your environment information and an example code snippet? I'm unable to reproduce this error using the mistral community checkpoint and the code example in the issue description