Model Request: MiniCPM-Llama3-V2.5

jhc13 / taggui

Tag manager and captioner for image datasets

GNU General Public License v3.0

774 stars 37 forks source link

Model Request: MiniCPM-Llama3-V2.5 #198

Open pjrpjr opened 5 months ago

pjrpjr commented 5 months ago

Please add support for this model. https://github.com/OpenBMB/MiniCPM-V

this is the demo page for testing it https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5

This model has a good combination of efficiency and quality in processing image subtitles. https://huggingface.co/spaces/opencompass/open_vlm_leaderboard

please add support for it

SytanSD commented 5 months ago

I second this. I have run the live demo locally with the provided inference code, and it provides considerably better tag/caption's than any of the models in taggui that can run in 24GB VRAM. Its also very fast even at FP16 (about 3s/caption for me on a 3090)

Easily my favorite model I have worked with. If I knew how to code in any capacity, I would try my best to help taggui support it

pjrpjr commented 5 months ago

have you tried the new model"[GLM-4v-9B]"]((https://huggingface.co/THUDM/glm-4v-9b)),which was released few days ago.it does better than anyother model

dai9000 commented 4 months ago

I third this! It seems to do a very good job, based on a few test with the online model.

PS: Just started using TagGUI over the weekend and am super impressed with the speed and usability of the program. Amazing!

geroldmeisinger commented 1 month ago

btw it's available in ollama