What are the requirements for image content in Kolors fine-tuned corpus?

modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!

Apache License 2.0

6.32k stars 564 forks source link

This is an interesting and broad topic in the realm of Diffusion models. We can offer some suggestions for fine-tuning:

For training an instant LoRA (for example, an anime character), it is preferable to use several images that maintain consistent appearance.
For training a style LoRA (for example, ink painting), the more images you can use, the better. Additionally, the LoRA rank should be high.
When synthesizing Chinese characters, the base model may struggle with generating long contexts, even if some models utilize powerful LLMs for encoding (such as the ChatGLM in Kolors). Current diffusion models are primarily capable of synthesizing simple, short texts.

modelscope / DiffSynth-Studio