jhc13 / taggui

Tag manager and captioner for image datasets
GNU General Public License v3.0
771 stars 37 forks source link

[Model Request] Joy Caption #266

Closed gesen2egee closed 3 months ago

gesen2egee commented 3 months ago

thread a script example demo

Perhaps this is currently the best caption model for NSFW, based on the Llama 3.1 8B Lora adapter and siglip. It is currently in pre-alpha status, but both the detail description and accuracy are excellent. It can use with the 4-bit BNB Llama 3.1

StableLlama commented 3 months ago

I follow you completely, that's why I posted the same idea in #263 :)

doloreshaze337 commented 3 months ago

I've implemented JoyCaption in my own fork if you want to give it a go (will need to manually install TagGUI from my fork, though), but there is one pretty big problem: JoyCaption requires Transformers 4.43+ (based on the official script) to work, but CogVLM breaks with Transformers 4.42+. You can fix CogVLM by editing one of its Python scripts (there's a diff in the CogVLM2 LLaMA3 Chat repo on HuggingFace), but, y'know, you need to edit another model to get JoyCaption to work. Probably why jhc hasn't implemented it yet, tbh.

jhc13 commented 3 months ago

Some models have already required me to edit their source code at runtime, so applying that fix to CogVLM could be doable.

The main reasons I haven't added suport for Joy Caption yet are:

  1. It's still in pre-alpha, and the creators of the model have given their approval to add it to TagGUI only once it's out of pre-alpha.
  2. I am busy right now and do not have much time to work on TagGUI.
jhc13 commented 3 months ago

Closing this issue as it's a duplicate of #263.