modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!
Apache License 2.0
6.32k stars 564 forks source link

What are the requirements for image content in Kolors fine-tuned corpus? #115

Open luoan7248 opened 1 month ago

luoan7248 commented 1 month ago

Mostly the pictures in the fine-tuned corpus have a clear focus on something, such as a character, a puppy, etc. But the picture content is more mixed cases, such as publicity posters, on which there are characters, scenery, and very important text (both in Chinese and English), and even some proper nouns, such as China Petrochemical, China Mobile, etc., fine-tuning is not very good, may I ask is the fine-tuning of the content of the corpus picture is there a specific requirement? image

Artiprocher commented 1 month ago

This is an interesting and broad topic in the realm of Diffusion models. We can offer some suggestions for fine-tuning:

  1. For training an instant LoRA (for example, an anime character), it is preferable to use several images that maintain consistent appearance.
  2. For training a style LoRA (for example, ink painting), the more images you can use, the better. Additionally, the LoRA rank should be high.
  3. When synthesizing Chinese characters, the base model may struggle with generating long contexts, even if some models utilize powerful LLMs for encoding (such as the ChatGLM in Kolors). Current diffusion models are primarily capable of synthesizing simple, short texts.