magic-research / PLLaVA

Official repository for the paper PLLaVA
487 stars 30 forks source link

Can PLLaVA caption rectangular videos without cropping? #38

Open tomyoung903 opened 2 months ago

tomyoung903 commented 2 months ago

I notice that the image processor crops rectangular images into square images, which inevitably loses some information.

It seems that cropping is also used during training.

What if we want to caption rectangular videos without losing edges to cropping?

ermu2001 commented 1 month ago

Yes, we've conducted conducted all our experiments with cropping.

Padding would work regarding retaining all information, but would downgrade the resolution and doesn't aligns to the training strategy of image models (CLIP and llava-next).