huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.68k stars 26.93k forks source link

ImageToTextPipeline does not support InstructBlip Models #27975

Open elena-soare20 opened 11 months ago

elena-soare20 commented 11 months ago

System Info

Who can help?

@Narsil @amyeroberts

Information

Tasks

Reproduction

processor = InstructBlipProcessor.from_pretrained("Salesforce/instructblip-flan-t5-xl") pipe = pipeline("image-to-text", model="Salesforce/instructblip-flan-t5-xl", processor=processor.image_processor, tokenizer=processor.tokenizer, device=0) prompt = "describe te following image" url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw)

pipe(images=image, prompt=prompt)

Expected behavior

returns a textual description of the image. Instead, I get an error: TypeError: ones_like(): argument 'input' (position 1) must be Tensor, not NoneType

I suspect this is caused by the ImageToTextPipeline.preprocess(), where we should ave custom behaviour for InstructBlip models to process the image and text in one go: inputs = processor(images=image, text=prompt, return_tensors="pt")

amyeroberts commented 10 months ago

Hi @elena-soare20, thanks for raising this issue!

Yes, at the moment InstructBLIP isn't compatible with the pipeline because of the specific processing it does - which is different from many other models. Specifically, it has two tokenizers to create qformer_input_ids and input_ids to be passed to the model. There's some ongoing work to unify our processors so that hopefully more models like these can be quickly integrated.

Happy to review any PRs for anyone in the community who would like to enable this. See also: #21110

nakranivaibhav commented 9 months ago

hey @amyeroberts I would be happy to work on this

amyeroberts commented 9 months ago

@nakranivaibhav Awesome! Feel free to ping me for review when you have a PR ready 🤗

nakranivaibhav commented 9 months ago

@amyeroberts Give me some time on this. The models are very large to reproduce the error. I am figuring out where to reproduce the error to start working on it.

amyeroberts commented 9 months ago

@nakranivaibhav If all you need is a model to test functionality i.e. a randomly initialized model that outputs nonsense is fine, then the small model used during tests might help here. The config to build the model and test inputs can be found here.

nakranivaibhav commented 9 months ago

@amyeroberts Yes that i what I need. Thank you for pointing it out.