Suggestions for future model

vikhyat / moondream

tiny vision language model

Apache License 2.0

4.85k stars 431 forks source link

Thank you for your model!

If you want to get inspiration from the best multimodal atm for future improvements for moondream, check out qwen-vl-max.

If you want to squeeze out more details from your model maybe you can have it auto slice the image into several pieces, caption each slice, then caption the whole image, then combine all the captions into a single caption.

I do this manually (cropping the image several times in photoshop) for problem images and I am able to get the model to see details it would normally miss or ignore.

vikhyat / moondream

Suggestions for future model #53