ImageToTextPipeline does not support InstructBlip Models

elena-soare20 commented 11 months ago

System Info

transformers version: 4.36.0.dev0
Platform: Linux-generic-x86_64
Python version: 3.8.12
Huggingface_hub version: 0.19.4
Safetensors version: 0.4.1
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 1.10.0a0+0aef44c (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help?

@Narsil @amyeroberts

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

processor = InstructBlipProcessor.from_pretrained("Salesforce/instructblip-flan-t5-xl") pipe = pipeline("image-to-text", model="Salesforce/instructblip-flan-t5-xl", processor=processor.image_processor, tokenizer=processor.tokenizer, device=0) prompt = "describe te following image" url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw)

pipe(images=image, prompt=prompt)

Expected behavior

returns a textual description of the image. Instead, I get an error: TypeError: ones_like(): argument 'input' (position 1) must be Tensor, not NoneType

I suspect this is caused by the ImageToTextPipeline.preprocess(), where we should ave custom behaviour for InstructBlip models to process the image and text in one go: inputs = processor(images=image, text=prompt, return_tensors="pt")

amyeroberts commented 10 months ago

Hi @elena-soare20, thanks for raising this issue!

Yes, at the moment InstructBLIP isn't compatible with the pipeline because of the specific processing it does - which is different from many other models. Specifically, it has two tokenizers to create qformer_input_ids and input_ids to be passed to the model. There's some ongoing work to unify our processors so that hopefully more models like these can be quickly integrated.

Happy to review any PRs for anyone in the community who would like to enable this. See also: #21110

nakranivaibhav commented 9 months ago

hey @amyeroberts I would be happy to work on this

amyeroberts commented 9 months ago

@nakranivaibhav Awesome! Feel free to ping me for review when you have a PR ready 🤗

nakranivaibhav commented 9 months ago

@amyeroberts Give me some time on this. The models are very large to reproduce the error. I am figuring out where to reproduce the error to start working on it.

amyeroberts commented 9 months ago

@nakranivaibhav If all you need is a model to test functionality i.e. a randomly initialized model that outputs nonsense is fine, then the small model used during tests might help here. The config to build the model and test inputs can be found here.

nakranivaibhav commented 9 months ago

@amyeroberts Yes that i what I need. Thank you for pointing it out.

huggingface / transformers