Trained Resample with Siglip Got inconvergence loss

thunlp / LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

303 stars 15 forks source link

Open lucasjinreal opened 5 months ago

lucasjinreal commented 5 months ago

Hi, I adopt this Resampler module to LLaVa without slicing, and replace the vision encoder from CLIP to siglip, the loss can not converge.

Any thought about this?