[Model Request] Joy Caption

jhc13 / taggui

Tag manager and captioner for image datasets

GNU General Public License v3.0

670 stars 31 forks source link

[Model Request] Joy Caption #266

Closed gesen2egee closed 4 weeks ago

gesen2egee commented 1 month ago

thread a script example demo

Perhaps this is currently the best caption model for NSFW, based on the Llama 3.1 8B Lora adapter and siglip. It is currently in pre-alpha status, but both the detail description and accuracy are excellent. It can use with the 4-bit BNB Llama 3.1

StableLlama commented 1 month ago

I follow you completely, that's why I posted the same idea in #263 :)

doloreshaze337 commented 4 weeks ago

I've implemented JoyCaption in my own fork if you want to give it a go (will need to manually install TagGUI from my fork, though), but there is one pretty big problem: JoyCaption requires Transformers 4.43+ (based on the official script) to work, but CogVLM breaks with Transformers 4.42+. You can fix CogVLM by editing one of its Python scripts (there's a diff in the CogVLM2 LLaMA3 Chat repo on HuggingFace), but, y'know, you need to edit another model to get JoyCaption to work. Probably why jhc hasn't implemented it yet, tbh.

jhc13 commented 4 weeks ago

Some models have already required me to edit their source code at runtime, so applying that fix to CogVLM could be doable.

The main reasons I haven't added suport for Joy Caption yet are:

It's still in pre-alpha, and the creators of the model have given their approval to add it to TagGUI only once it's out of pre-alpha.
I am busy right now and do not have much time to work on TagGUI.

jhc13 commented 4 weeks ago

Closing this issue as it's a duplicate of #263.