GPT-4-Vision and Gemini Vision multimodal model support?

youself64github commented 10 months ago

I want to add vision chat battle + direct vision chat support. GPT-4 Vision and Gemini Vision are multimodal models. along add other multimodal models.

maninthemiddle01 commented 8 months ago

I'd also like to see this feature implemented! Adding vision chat battle and direct vision chat support with cutting-edge multimodal models would be incredibly exciting.

Maybe it also would be possible to implement LLaVA-NEXT, MiniCPM-V, CogVLM Chat, QwenVL, InstructBLIP Vicuna 7b, and UForm-Gen2? These powerful models would enable fascinating conversations, collaborations, and insights. The WildVision vision-arena (https://huggingface.co/spaces/WildVision/vision-arena) showcases how this could be implemented.

I'm also curious what hurdles are there to implement this so far.

dirtycomputer commented 6 months ago

+1

lm-sys / FastChat

GPT-4-Vision and Gemini Vision multimodal model support? #2881