vikhyat / moondream

tiny vision language model
https://moondream.ai
Apache License 2.0
4.9k stars 437 forks source link

Try openchat LLM instead of Phi1.5 #29

Open sujitvasanth opened 7 months ago

sujitvasanth commented 7 months ago

Its a truly amazing model but Hi I was disappointed by the instructability of the phi1.5 element of your model. For instance, if asked to ignore a particular object in a photo it doesn't follow this and also when asked to write a certain number of words doesn't do this reliably. when asked to start a new paragraph does not do this. Doesnt respond well to text later in the prompt.

I recently reviewed a lot of models looking for one for a chatbot.

Nous-Hermes-2-SOLAR-10.7B https://huggingface.co/NousResearch/Nous-Hermes-2-SOLAR-10.7B

OpenChat3.5(0106) https://huggingface.co/openchat/openchat-3.5-0106 https://github.com/imoneoi/openchat

are much more instructible and urge you to look at these for connecting your visual encoder.

axrwl commented 7 months ago

Their large memory requirement compared to phi might defeat the purpose.

sujitvasanth commented 7 months ago

@axrwl quantised latest openchat takes only 4Gb https://huggingface.co/openchat/openchat-3.5-0106 main problem is the only working quantised versions for vision llms I've seen is bitsandbytes transformers library does mot seem to support gptq in its image-to-text pipeline despite supporting it in its main llm inference