openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
26.1k stars 3.33k forks source link

Preprocessor - How does it work? #459

Open whishei opened 3 months ago

whishei commented 3 months ago

I have been exploring the source code of different preprocessors for ViT models and I am struggling to understand the exact pre-processing steps an image goes through in the ImageProcessor / CLIPProcessor . How is the image resized to 224x224? Is it similar to skimage.transform.resize() where it performs either up-sampling or down-sampling based on the image size? Thanks in advance.

Looking into the image_processing_clip.py through huggingface I found the following explanation: "Resize an image. The shortest edge of the image is resized to size["shortest_edge"], with the longest edge resized to keep the input aspect ratio."