Open RonanKMcGovern opened 2 weeks ago
Thanks for the issue @RonanKMcGovern , can confirm - the thing to change is the hub repository source, that one seems to trigger the mismatch. This one https://huggingface.co/mistral-community/pixtral-12b has a converted version of pixtral that is transformers-compatible and should load without mismatch, but you are right the code example should be updated! i.e. this works with transformers:
from PIL import Image
from transformers import AutoProcessor, LlavaForConditionalGeneration
model_id = "mistral-community/pixtral-12b"
model = LlavaForConditionalGeneration.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)
IMG_URLS = [
"https://picsum.photos/id/237/400/300",
"https://picsum.photos/id/231/200/300",
"https://picsum.photos/id/27/500/500",
"https://picsum.photos/id/17/150/600",
]
PROMPT = "<s>[INST]Describe the images.\n[IMG][IMG][IMG][IMG][/INST]"
inputs = processor(text=PROMPT, images=IMG_URLS, return_tensors="pt").to("cuda")
generate_ids = model.generate(**inputs, max_new_tokens=500)
output = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(output)
That’s excellent thanks. R
On Fri 20 Sep 2024 at 12:24, Pablo Montalvo @.***> wrote:
Thanks for the issue @RonanKMcGovern https://github.com/RonanKMcGovern , can confirm - the thing to change is the hub repository source, that one seems to trigger the mismatch. This one https://huggingface.co/mistral-community/pixtral-12b has a converted version of pixtral that is transformers-compatible and should load without mismatch, but you are right the code example should be updated! i.e. this works with transformers:
from PIL import Imagefrom transformers import AutoProcessor, LlavaForConditionalGenerationmodel_id = "mistral-community/pixtral-12b"model = LlavaForConditionalGeneration.from_pretrained(model_id)processor = AutoProcessor.from_pretrained(model_id) IMG_URLS = ["https://picsum.photos/id/237/400/300", "https://picsum.photos/id/231/200/300", "https://picsum.photos/id/27/500/500","https://picsum.photos/id/17/150/600", ]PROMPT = "
[INST]Describe the images.\n[IMG][IMG][IMG][IMG][/INST]" inputs = processor(text=PROMPT, images=IMG_URLS, return_tensors="pt").to("cuda")generate_ids = model.generate(**inputs, max_new_tokens=500)output = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]print(output)— Reply to this email directly, view it on GitHub https://github.com/huggingface/transformers/issues/33591#issuecomment-2363503259, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVG6CWKC2H3BQM57DEPK73ZXQAVLAVCNFSM6AAAAABOPZTFBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRTGUYDGMRVHE . You are receiving this because you were mentioned.Message ID: @.***>
System Info
transformers
version: 4.45.0.dev0Who can help?
@amyeroberts @ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I'm running the exact code shown on this page:
Error:
Expected behavior
I would expect the model to load normally. Something is off in the dimensions. Is there perhaps another model version on HuggingFace Hub with the correct config? Many thanks.
P.S. I had to uninstall flash attn, I assume that's just not supported, worth adding to docs.