Open Pernekhan opened 2 days ago
Hi @Pernekhan, thanks for opening an issue! Just curious what is the use-case for such a size?
Hi @Pernekhan, thanks for opening an issue! Just curious what is the use-case for such a size?
I personally don't know what's a use case, but we see some people passing such images in our system.
Actually, I'm not sure mllama is intended to be used with such an image size, so that's probably fine that it crashed rather than silently producing some weird result.
Actually mllama is smart and it can work with such small images, for example, when you pass an image with 1x2 sizes Llama-3.2-11B-Vision-Instruct
is able to respond something like this: The image is a solid black background with no discernible objects or features. There is no title or description provided for the image. The image appears to be a blank or empty space, with no visual elements or details to suggest what it might represent. It could be a placeholder image, a test image, or simply a blank slate for further content or design. Without more context, it's difficult to provide a more specific interpretation of the image.
1x1 or Wx1 images are also valid images, and mllama should be able to work with those images (it already works with 1xH size images, 2x2 size images)
Thanks for the clarification! I still think it might be an error of someone doing a request and can't imagine a meaningful usecase, but I might missing smth 🙂
In case the fix is simple and does not bloat the code with special-case conditions we would appreciate it. Would you have the bandwidth to submit a fix?
Yes, it's hard to imagine a use-case for it, but we see cases in production.
My plate is full at the moment, but maybe someone else can pick it up?
Also, it's similar to this issue: https://github.com/huggingface/transformers/issues/21638
Hi everyone! We decided to work on this issue during the Hacktoberfest meetup of PyData Madrid (we had around 90-100 minutes to work on the issue at hand). Some things we discovered:
from transformers import AutoImageProcessor
from PIL import Image
if __name__ == "__main__":
image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
data = Image.new("RGB", (1, 1))
data = image_processor.preprocess(data, return_tensors="pt").data
print(data)
Our current assumption is that when passing the values down to the normalize task, this causes the normalization to fail, resulting in the problem described in this issue. Given this is used by most vision models, this issue should apply to all these models too.
P.D To be real meta, I asked Meta Llama Vision Free endpoint at Together about the usage of 1x1 pixel images, asking if these could be tracking pixels. These are the prompt and the response:
Prompt: "I scrapped the content of a web site, after obtaining permission from the owner. Then I processed the images present and discovered a 1x1 pixel that was barely visible in the page. What could be this 1x1 pixel image for? Is there any use for such images in web pages? Could this be a tracking pixel? Reply in English."
Response: "The infamous 1x1 pixel image!
Yes, it's not uncommon to find tiny images like this on web pages. Here are some possible explanations:
To determine the purpose of the 1x1 pixel image you found, you can try the following:
Keep in mind that not all 1x1 pixel images are malicious or invasive. Many are used for legitimate purposes, such as analytics or advertising. However, if you're concerned about user privacy or security, it's always a good idea to investigate further."
System Info
When image size of 1x1 or Wx1 is passed, the normalize() method crashes with the following error:
Who can help?
@amyeroberts, @qubvel
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
It shouldn't crash