jhc13 / taggui

Tag manager and captioner for image datasets
GNU General Public License v3.0
474 stars 23 forks source link

Model Request: MiniCPM-Llama3-V2.5 #198

Open pjrpjr opened 2 weeks ago

pjrpjr commented 2 weeks ago

Please add support for this model. https://github.com/OpenBMB/MiniCPM-V

this is the demo page for testing it https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5

This model has a good combination of efficiency and quality in processing image subtitles. https://huggingface.co/spaces/opencompass/open_vlm_leaderboard

please add support for it

SytanSD commented 4 days ago

I second this. I have run the live demo locally with the provided inference code, and it provides considerably better tag/caption's than any of the models in taggui that can run in 24GB VRAM. Its also very fast even at FP16 (about 3s/caption for me on a 3090)

Easily my favorite model I have worked with. If I knew how to code in any capacity, I would try my best to help taggui support it

pjrpjr commented 4 days ago
屏幕截图 2024-07-01 153715

have you tried the new model"[GLM-4v-9B]"]((https://huggingface.co/THUDM/glm-4v-9b)),which was released few days ago.it does better than anyother model