thunlp / LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
303 stars 15 forks source link

Trained Resample with Siglip Got inconvergence loss #22

Open lucasjinreal opened 5 months ago

lucasjinreal commented 5 months ago

Hi, I adopt this Resampler module to LLaVa without slicing, and replace the vision encoder from CLIP to siglip, the loss can not converge.

Any thought about this?