Open pjrpjr opened 2 weeks ago
I second this. I have run the live demo locally with the provided inference code, and it provides considerably better tag/caption's than any of the models in taggui that can run in 24GB VRAM. Its also very fast even at FP16 (about 3s/caption for me on a 3090)
Easily my favorite model I have worked with. If I knew how to code in any capacity, I would try my best to help taggui support it
have you tried the new model"[GLM-4v-9B]"]((https://huggingface.co/THUDM/glm-4v-9b)),which was released few days ago.it does better than anyother model
Please add support for this model. https://github.com/OpenBMB/MiniCPM-V
this is the demo page for testing it https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5
This model has a good combination of efficiency and quality in processing image subtitles. https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
please add support for it