Expose default_to_square parameter in CLIPImageProcessor

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

134.88k stars 26.98k forks source link

Expose default_to_square parameter in CLIPImageProcessor #23669

Closed shubhamgoel27 closed 1 year ago

shubhamgoel27 commented 1 year ago

I'm looking to train an image model without cropping the input's sides (either horizontally or vertically). But I noticed that in this CLIPImageProcessor class, the default_to_square parameter is hard-coded to False. Is there any way I can still modify this so that my input in not cropped as a result of the resize and center_crop combination of transforms?

sgugger commented 1 year ago

cc @amyeroberts

amyeroberts commented 1 year ago

Hi @shubhamgoel27,

default_to_square is used in get_size_dict in order to control the behaviour when converting old configuration values (int, tuples or lists) to the expected dictionary format for the size parameter. As such, it's tied to the image processor class and isn't meant to be modified.

If I've understood correctly, you'd like to use the CLIPImageProcessor, but not perform resizing or cropping of the images. For all image processors, all transformations can be turned on / off with the do_xxx flags either during instantiation or calling. To not resize or crop the input images:

from transformers import CLIPImageProcessor

image_processor = CLIPImageProcessor("openai/clip-vit-base-patch32")
inputs = image_processor(images=images, do_resize=False, do_center_crop=False)

Note: if do_resize=False and do_center_crop=False, then all the input images but be of the same (height, width) dimensions in order to create a batch.

shubhamgoel27 commented 1 year ago

Hey @amyeroberts ,

Thanks for the swift response.

My use-case is to not crop the image during the resize step, but still resize it to a smaller size (e.g. 224x244). So if the original image is 576x1024, the resize method would stretch/squeeze whichever dimension necessary and return a 224x224 image. But since the default_to_square parameter is hard-coded to False, I couldn't find a way to do so using the CLIPImageProcessor.

P.S. The context around this is that I don't want to crop useful information out from either sides (horizontal or vertical) during the pre-processing stage, as it might have a lot of value for the domain I'm interested in.

amyeroberts commented 1 year ago

@shubhamgoel27 Is there a reason that you specifically want to use CLIP's image processor? All of the image processors are implemented to be aligned with the processing in the model paper, so it's not always possible to adapt it to every need. For your use case, the simplest approach would be to use another model's image processor, specifically ViT's. This image processor does three simple transformations:

Resizes the images to 224x224
Rescales the pixel values to be between 0-1
Normalizes the pixel values with a given image mean and std

If it's necessary to have the same normalization constants as those used in CLIP, these ca be passed in when instantiating the class e.g.:

from transformers import ViTImageProcessor
from transformers.utils.constants import OPENAI_CLIP_MEAN, OPENAI_CLIP_STD

image_processor = ViTImageProcessor(image_mean=OPENAI_CLIP_MEAN, image_std=OPENAI_CLIP_STD)

shubhamgoel27 commented 1 year ago

@amyeroberts I'm finetuning the VIT component of a CLIP model, so was trying to use CLIPImageProcessor. But it looks like the ViTImageProcessor is allowing for both height and width in the resize method without using the default_to_square=False. So that should most likely be enough for my use-case. Thanks for pointing it out :)