salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BSD 3-Clause "New" or "Revised" License
4.85k stars 648 forks source link

Why the resize does not preserve the original aspect ratio #119

Open yurymalkov opened 1 year ago

yurymalkov commented 1 year ago

Hi, thank you for the work!

I've played with the code and noticed that examples do not preserve the original aspect ration during the resize. E.g. https://github.com/salesforce/BLIP/blob/d10be550b2974e17ea72e74edc7948c9e5eab884/predict.py#L93 or the colab example.

I wonder if it is done on purpose?

saffie91 commented 1 year ago

I came to ask the same question, is there no need to do letterboxing?

LiJunnan1992 commented 1 year ago

Thanks for your question. It is not necessary to preserve the original aspect ratio because the model was trained on augmented image crops with various ratios. We find this transform to perform slightly better than CenterCrop during inference.