Open guoqingbao opened 3 months ago
A step in this direction is @santiagomed adding moondream to candle-transformers
, see this readme.
A step in this direction is @santiagomed adding moondream to
candle-transformers
, see this readme.
Fantastic! Thanks for the new model!
I'll give llava
a shot. Would be great to have more multi-modal models in here.
EDIT: Been busy but still want to work on this. Will pop into discord to chat with folks about how to approach this.
I have implemented LLaVA at candle-llava. Will contribute to this project soon.
I have implemented LLaVA at candle-llava. Will contribute to this project soon.
Sounds great, looking forward to have this included!
Do you have any plans to support multimodal LLMs, such as MiniGPT-4/MiniGPT v2 (https://github.com/Vision-CAIR/MiniGPT-4/) and LLaVA (https://github.com/haotian-liu/LLaVA/)? That would be a significant enhancement if these popular multimodal LLMs were supported in Candle. I believe these multimodal LLMs are built upon Vision Transformer (ViT) (CLIP, BLIP, etc.) and foundational language models including LLaMa and Mistral, which are already supported in Candle.