thunlp / LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
268 stars 14 forks source link

About the slice_logic #7

Closed power0341 closed 3 months ago

power0341 commented 3 months ago

the result of slicing looks very weird (haven't read the paper, may completely wrong 🙄) OG: 9cd8aab1f2ddbf6f15867247d4331016

then the 7 patches: image0 image1 image2 image3 image4 image5 image6

yfzhang114 commented 3 months ago

so what it the reason?

power0341 commented 3 months ago

@yfzhang114 https://github.com/thunlp/LLaVA-UHD/blob/69e75d0cc6bc4d6000045f08f94852d2d465cd91/llava_uhd/train/llava-uhd/adapt_clip.py#L272