Using distributed or parallel set-up in script?: no
Who can help?
@Narsil @sijunhe
Information
[ ] The official example scripts
[X] My own modified scripts
Tasks
[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)
Reproduction
from transformers import pipeline
urls = ["https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png", "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/tree.png"]
oracle = pipeline(task="vqa", model="dandelin/vilt-b32-finetuned-vqa")
oracle(question="What's in the image?", image=urls, top_k=1)
(Truncated) error:
TypeError Traceback (most recent call last)
Cell In[1], [line 11](vscode-notebook-cell:?execution_count=1&line=11)
[8](vscode-notebook-cell:?execution_count=1&line=8) oracle = pipeline(task="vqa", model="dandelin/vilt-b32-finetuned-vqa", image_processor=image_processor)
[9](vscode-notebook-cell:?execution_count=1&line=9) # for out in tqdm(oracle(question="What's in this image", image=dataset, top_k=1)):
[10](vscode-notebook-cell:?execution_count=1&line=10) # print(out)
---> [11](vscode-notebook-cell:?execution_count=1&line=11) oracle(question="What's in this image", image=urls, top_k=1)
File ~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:114, in VisualQuestionAnsweringPipeline.__call__(self, image, question, **kwargs)
[107](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:107) """
[108](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:108) Supports the following format
[109](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:109) - {"image": image, "question": question}
[110](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:110) - [{"image": image, "question": question}]
[111](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:111) - Generator and datasets
[112](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:112) """
[113](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:113) inputs = image
--> [114](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:114) results = super().__call__(inputs, **kwargs)
[115](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:115) return results
File ~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1224, in Pipeline.__call__(self, inputs, num_workers, batch_size, *args, **kwargs)
[1220](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1220) if can_use_iterator:
[1221](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1221) final_iterator = self.get_iterator(
[1222](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1222) inputs, num_workers, batch_size, preprocess_params, forward_params, postprocess_params
[1223](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1223) )
-> [1224](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1224) outputs = list(final_iterator)
...
[120](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:120) inputs["question"], return_tensors=self.framework, padding=padding, truncation=truncation
[121](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:121) )
[122](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:122) image_features = self.image_processor(images=image, return_tensors=self.framework)
TypeError: string indices must be integers
This error is reproducible on the latest version (v4.41.2)
Expected behavior
The pipeline should broadcast the same question on all images and execute the model on those image-question pair, as per the documentation
Note: This currently works, but it is not as easy to use as passing the lists directly (and this doesn't allow passing the dataset directly like this):
oracle([{"question": "What's in the image?", "image": url} for url in urls])
Currently, the call function only handles one image-question pair as input. I can make a quick PR to make it also handle list of images and questions. I have no idea about the dataset part, though
System Info
transformers
version: 4.42.0.dev0Who can help?
@Narsil @sijunhe
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
(Truncated) error:
This error is reproducible on the latest version (v4.41.2)
Expected behavior
The pipeline should broadcast the same question on all images and execute the model on those image-question pair, as per the documentation
Note: This currently works, but it is not as easy to use as passing the lists directly (and this doesn't allow passing the
dataset
directly like this):