Open mgoin opened 1 day ago
Hi @mgoin! Sounds great! Thanks for working on this 🤗
cc @yonigozlan maybe if you have bandwidth
Thanks for the review and context @yonigozlan ! I will look into it later today. Yes you are correct about using it within a Processor, however I have tested this works within vLLM simply by adding use_fast=True
to our AutoProcessor.from_pretrained()
call here. No need to manually specify the Processor class.
One bug I noticed is that if I specify use_fast=True
and there isn't a Fast version of the ImageProcessor available, I get an exception. I can look into this, but would be good to get clarity that this is unintended behavior.
Oh great news that it already works with AutoProcessor. As I said this is the first fast image processor used in a processor so it was not guaranteed :).
One bug I noticed is that if I specify use_fast=True and there isn't a Fast version of the ImageProcessor available, I get an exception. I can look into this, but would be good to get clarity that this is unintended behavior.
Yes this is the same right now when using ImageProcessingAuto. I don't think it should be that way though, especially as more and more people will want to use fast image processors by default. I'll open a PR to fix this.
Current plan is:
The deprecation cycle is needed as there are slight differences in outputs when using torchvision vs PIL, see this PR https://github.com/huggingface/transformers/pull/34785 for more info.
What does this PR do?
This PR implements a fast image processor for Pixtral. Follows issue https://github.com/huggingface/transformers/issues/33810.
The key acceleration comes from replacing Pillow/Numpy tensors and functions (resize, rescale, normalize) with torch tensors and torchvisionv2 functions. It comes along with support for
torch.compile
and passingdevice="cuda"
during inference to process the input on GPU. One limitation is that onlyreturn_tensors="pt"
will be supported.Usage
From simple benchmarking with a single image of size
[3, 876, 1300]
, I see 6x to 10x speedupBefore submitting
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.