Pixtral error: The following `model_kwargs` are not used by the model

TheDuckingDuck commented 1 month ago

System Info

Transformers version https://github.com/huggingface/transformers/commit/8bd2b1e8c23234cd607ca8d63f53c1edfea27462

Who can help?

@ArthurZucker @amyeroberts

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Running this code

from transformers import LlavaForConditionalGeneration, AutoProcessor
from PIL import Image

model_id = "hf-internal-testing/pixtral-12b"
model = LlavaForConditionalGeneration.from_pretrained(model_id, low_cpu_mem_usage=True, load_in_8bit=True)
processor = AutoProcessor.from_pretrained(model_id)

IMG_URLS = [
    "https://picsum.photos/id/237/400/300",
    "https://picsum.photos/id/231/200/300",
    "https://picsum.photos/id/27/500/500",
    "https://picsum.photos/id/17/150/600",
]
PROMPT = "<s>[INST]Describe the images.\n[IMG][IMG][IMG][IMG][/INST]"

inputs = processor(text=PROMPT, images=IMG_URLS, return_tensors="pt").to("cuda")
generate_ids = model.generate(**inputs, max_new_tokens=500)
ouptut = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

EXPECTED_GENERATION = """
Describe the images.
Sure, let's break down each image description:

1. **Image 1:**
   - **Description:** A black dog with a glossy coat is sitting on a wooden floor. The dog has a focused expression and is looking directly at the camera.
   - **Details:** The wooden floor has a rustic appearance with visible wood grain patterns. The dog's eyes are a striking color, possibly brown or amber, which contrasts with its black fur.

2. **Image 2:**
   - **Description:** A scenic view of a mountainous landscape with a winding road cutting through it. The road is surrounded by lush green vegetation and leads to a distant valley.
   - **Details:** The mountains are rugged with steep slopes, and the sky is clear, indicating good weather. The winding road adds a sense of depth and perspective to the image.

3. **Image 3:**
   - **Description:** A beach scene with waves crashing against the shore. There are several people in the water and on the beach, enjoying the waves and the sunset.
   - **Details:** The waves are powerful, creating a dynamic and lively atmosphere. The sky is painted with hues of orange and pink from the setting sun, adding a warm glow to the scene.

4. **Image 4:**
   - **Description:** A garden path leading to a large tree with a bench underneath it. The path is bordered by well-maintained grass and flowers.
   - **Details:** The path is made of small stones or gravel, and the tree provides a shaded area with the bench invitingly placed beneath it. The surrounding area is lush and green, suggesting a well-kept garden.

Each image captures a different scene, from a close-up of a dog to expansive natural landscapes, showcasing various elements of nature and human interaction with it.
"""

Expected behavior

To not crash with

  File "/mnt/688LD58782D4FA20/XComp/CogVLM2/basic_demo/hf.py", line 17, in <module>
    generate_ids = model.generate(**inputs, max_new_tokens=500)
  File "/mnt/688LD58782D4FA20/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/688LD58782D4FA20/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1811, in generate
    self._validate_model_kwargs(model_kwargs.copy())
  File "/mnt/688LD58782D4FA20/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1215, in _validate_model_kwargs
    raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['token_type_ids'] (note: typos in the generate arguments will also show up in this list)

ArthurZucker commented 1 month ago

Hey! This is just the tokenization config, hf internal testing is meant for testing. i think a checkpoint was converted by a community member here https://huggingface.co/Himetsu/pixtral-12b

IdiotSandwichTheThird commented 1 month ago

Hey! This is just the tokenization config, hf internal testing is meant for testing. i think a checkpoint was converted by a community member here https://huggingface.co/Himetsu/pixtral-12b

That does get me a bit further, but the whole thing still crashes before generating with


    query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 2242, 32, 160]' is invalid for input of size 9183232

Which is not an error I feel like I am capable of debugging.

Full log:

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 6/6 [00:56<00:00,  9.50s/it]
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Traceback (most recent call last):
  File "F:\XComp\CogVLM2\basic_demo\hf.py", line 36, in <module>
    generate_ids = model.generate(**inputs, max_new_tokens=500)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2050, in generate
    result = self._sample(
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 3000, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llava\modeling_llava.py", line 519, in forward
    outputs = self.language_model(
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\mistral\modeling_mistral.py", line 1033, in forward
    outputs = self.model(
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\mistral\modeling_mistral.py", line 810, in forward
    layer_outputs = decoder_layer(
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\mistral\modeling_mistral.py", line 550, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\Users\Bunner\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\mistral\modeling_mistral.py", line 448, in forward
    query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 2242, 32, 160]' is invalid for input of size 9183232

amyeroberts commented 1 month ago

Hi @IdiotSandwichTheThird, could you share your environment information and an example code snippet? I'm unable to reproduce this error using the mistral community checkpoint and the code example in the issue description

huggingface / transformers