Open youself64github opened 10 months ago
I'd also like to see this feature implemented! Adding vision chat battle and direct vision chat support with cutting-edge multimodal models would be incredibly exciting.
Maybe it also would be possible to implement LLaVA-NEXT, MiniCPM-V, CogVLM Chat, QwenVL, InstructBLIP Vicuna 7b, and UForm-Gen2? These powerful models would enable fascinating conversations, collaborations, and insights. The WildVision vision-arena (https://huggingface.co/spaces/WildVision/vision-arena) showcases how this could be implemented.
I'm also curious what hurdles are there to implement this so far.
+1
I want to add vision chat battle + direct vision chat support. GPT-4 Vision and Gemini Vision are multimodal models. along add other multimodal models.