thunlp / LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
268 stars 14 forks source link

About the adaptive size part question #19

Open lucasjinreal opened 3 months ago

lucasjinreal commented 3 months ago

I noticed that the process image actually make the images back to 336,,336 sizes and feed into clip.

But, why clip still need to interpolate? I don't get it, since the image now is actually normal image.