I have been exploring the source code of different preprocessors for ViT models and I am struggling to understand the exact pre-processing steps an image goes through in the ImageProcessor / CLIPProcessor . How is the image resized to 224x224? Is it similar to skimage.transform.resize() where it performs either up-sampling or down-sampling based on the image size? Thanks in advance.
Looking into the image_processing_clip.py through huggingface I found the following explanation:
"Resize an image. The shortest edge of the image is resized to size["shortest_edge"], with the longest edge resized to keep the input aspect ratio."
I have been exploring the source code of different preprocessors for ViT models and I am struggling to understand the exact pre-processing steps an image goes through in the ImageProcessor / CLIPProcessor . How is the image resized to 224x224? Is it similar to skimage.transform.resize() where it performs either up-sampling or down-sampling based on the image size? Thanks in advance.
Looking into the image_processing_clip.py through huggingface I found the following explanation: "Resize an image. The shortest edge of the image is resized to size["shortest_edge"], with the longest edge resized to keep the input aspect ratio."