Open NielsRogge opened 10 months ago
@NielsRogge Thanks for opening the issue!
It's fine to open up to the community but you'll need to add a checklist of the image processors so it's clear who is working on what and what's done as well as ideally some instructions on what it means for each one to be "done" e.g. making sure to run slow tests for models.
@NielsRogge ,
If I understand it correctly, we need to match the interpolation:
For example for convnext: convnext should be changed to Bicubic as per timm/convnext .
If that's correct , I can take this up for all the models. Let me know.
Yes that is correct, see also the original implementation. Thanks for spotting that. Hence feel free to open a PR to update this, along with the image processor created in the conversion script. Ideally we assert the pixel values created by it against the original implementation, like done here for DINOv2.
Sure! thanks for the pointers, will work on it.
DieT and DPT default interpolation types matches with the original implementation types to BICUBIC . That's what I see it. Let me know if I overlooked.
@NielsRogge ,
would you have a look ?
@NielsRogge Can you please complete the checklist here?
@NielsRogge I would like to take this issue up.
@NielsRogge , @nileshkokane01 Can I work on this issue and help in completing the checklist ?
@NielsRogge , is it possible for me to contribute ? Thanks.
Feature request
As pointed out in https://github.com/huggingface/transformers/pull/27742, some image processors might need a correction on the default interpolation method being used (resampling in Pillow). We could check this on a per-model basis.
Motivation
Interpolation methods have a slight (often minimal) impact on performance. However it would be great to verify this on a per-model basis.
e.g. ViT's image processor defaults to BILINEAR but should use BICUBIC as seen here. We can update the default values of the image processors, but can't update the configs on the hub as this would break people's fine-tuned models.
Your contribution
I could work on this, but this seems like a good first issue for first contributors.
To be checked (by comparing against original implementation):