It was finetuned in chinese so it can tend to sometimes substitute chinese words which i combat using a badwords list and a system prompt that tells it it is an english captioner, and to translate any non-english words to english.
Additionally its quite good at determining art styles.
I found this model to be very creative, is llama3 based and has been finetuned to produce captions as well as wd tags. https://huggingface.co/sdasd112132/Vision-8B-MiniCPM-2_5-Uncensored-and-Detailed-4bit
It was finetuned in chinese so it can tend to sometimes substitute chinese words which i combat using a badwords list and a system prompt that tells it it is an english captioner, and to translate any non-english words to english.
Additionally its quite good at determining art styles.