Open 7AtAri opened 1 day ago
@7AtAri hey which version of transformers you are using? Can you try to update to the latest v4.46 version with pip install transformers==4.46
as it works for me in the latest version
with transformers==4.46 the same error persists
@7AtAri it works for me with transformer 4.46 with the code from demo in the hub. If you are using jupyter notebook, make sure you restart the kernel and that the transformers version being imported is indeed v4.46
. If the error persists, share your env with transformers-cli env
please
The code should never go in the path with _merge_input_ids_with_image_features
tbh
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch
from PIL import Image
import requests
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", torch_dtype=torch.float16, low_cpu_mem_usage=True, device_map="cuda:0")
# prepare image and text prompt, using the appropriate prompt template
url = "https://www.ilankelman.org/stopsigns/australia.jpg"
image = Image.open(requests.get(url, stream=True).raw)
# Define a chat history and use `apply_chat_template` to get correctly formatted prompt
# Each value in "content" has to be a list of dicts with types ("text", "image")
conversation = [
{
"role": "user",
"content": [
{"type": "text", "text": "What is shown in this image?"},
{"type": "image"},
],
},
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda:0")
# autoregressively complete prompt
output = model.generate(**inputs, max_new_tokens=100)
print(processor.decode(output[0], skip_special_tokens=True))
System Info
container built on 24th of october: including pip install tqdm pip install torch pip install torchvision pip install transformers pip install deepspeed==0.15.2 pip install accelerate pip install wandb pip install lightning pip install optuna pip install ray[tune] pip install pyarrow pip install nltk
pip install pandas pip install numpy pip install matplotlib
pip install scipy pip install scikit-learn
pip install bitsandbytes pip install peft pip install pillow pip install flash-attn --no-build-isolation
Who can help?
@zucchini-nlp @arthur
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
using a batch size of 8 in a train_collate function with the AutoProcessor as well as LLaVA-Next Processor:
in the forward pass:
the forward pass
return self.model.forward(*args, *kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 873, in forward inputs_embeds, attention_mask, positionids, labels, = self._merge_input_ids_with_image_features( File "/opt/conda/lib/python3.10/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 551, in _merge_input_ids_with_image_features raise ValueError( ValueError: Number of image tokens in input_ids (2040) different from num_images (8).
Expected behavior
did not throw this error before