huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.35k stars 26.12k forks source link

Chameleon model failed after receiving two times the same inputs #32022

Closed francescortu closed 1 month ago

francescortu commented 1 month ago

System Info

Who can help?

@zucchini-nlp

Information

Tasks

Reproduction

After importing the model and processor

from transformers import ChameleonProcessor, ChameleonForCausalLM
processor = ChameleonProcessor.from_pretrained("facebook/chameleon-7b", torch_dtype=torch.float16)
model = ChameleonForCausalLM.from_pretrained("facebook/chameleon-7b", torch_dtype=torch.float16, device_map="cuda:0")

from PIL import Image
image1 = Image.open("<path-to-image>")
image2=Image.open("<path-to-image>")
prompt = "<image>"

inputs = processor(text=[prompt,prompt], images=[image, image2], return_tensors="pt").to("cuda:0")

Execute

output = model(**inputs)
output = model(**inputs)

The second time I run the code, it raises an error. After some investigation, I found that the problem is with the forward pass, which substitutes the token in input_ids with the correct image token from the autoencoder. However, when run the second time, it attempts to perform the substitution again since pixel_values are still present, but the token is no longer there (as it has been replaced with the real image tokens).

Expected behavior

I expect the forward pass not to modify the input tensors.

amyeroberts commented 1 month ago

cc @zucchini-nlp

zucchini-nlp commented 1 month ago

@francescortu thanks for noticing, made a PR to fix!