Could you compare with MoE-LLaVA-1.6B×4-Top2? It seems more better?

vikhyat / moondream

tiny vision language model

https://moondream.ai

Apache License 2.0

4.88k stars 433 forks source link

Could you compare with MoE-LLaVA-1.6B×4-Top2? It seems more better? #42

Closed llziss4ai closed 7 months ago

llziss4ai commented 7 months ago

Model	Activated Param	Resolution	VQAv2	GQA	VizWiz	T-VQA
MoE-LLaVA-1.6B×4-Top2	2.0B	336	76.7	60.3	36.2	50.1
moondream	1.6B	384	74.3	56.3	30.3	39.8

I just found its results from https://github.com/PKU-YuanGroup/MoE-LLaVA/tree/main?tab=readme-ov-file#-model-zoo

sujitvasanth commented 7 months ago

there is also https://huggingface.co/YouLiXiya/tinyllava-v1.0-1.1b-hf https://huggingface.co/bczhou/tiny-llava-v1-hf

both of which run natively from hf transformers and can be quantized to 4 bit with bitsandbytes they occupy 2-3gb vram and presumably can be fine-tuned using the llava github examples

currently MoE-LLaVA-1.6B×4-Top2 required deepspeed for inference and cant be quantised although the author is asking for help to do it.

vikhyat commented 7 months ago

couldn't get the code to run so i can't repro these benchmarks

sujitvasanth commented 7 months ago

@vikhyat may be some thing to learn from MoeLlava as utilises different llm backbones including phi2 and openchat also mixture of experts architecture seems to have reduced hallucinations

was able to get it running pretty easy just clone the repo, cd into it and run deepspeed predict.py had to redirect the image and modelname as below

image = '/home/sujit/Downloads/MoE-LLaVA-main/moellava/serve/examples/extreme_ironing.jpg'
inp = 'What is unusual about this image?'
model_path = 'LanguageBind/MoE-LLaVA-StableLM-1.6B-4e'  # choose a model