huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
23.92k stars 4.93k forks source link

SD3 cannot finetunes a better model (hand and face deformation)? #8748

Open KaiWU5 opened 3 days ago

KaiWU5 commented 3 days ago

Describe the bug

I want to finetune sd3 to improve its human generation quality with 3million high-quality human datasets (which has been proven useful on sdxl and other models). But hand and face deformation doesn't improve much after two days of training.

I am using train script

What I have been done so far:

  1. regular training with 3 million data with batch size 2x24(V100) for 2 epochs with lr 5e-6 and adamw optimizer
  2. prodigy optimizer training with same setting
  3. Add q,k RMS norm to each attention layer
  4. only train several blocks

All of my training gives me nearly the same deformation results, where the hands are never normal like human.

Could you some provide more experiments about sd3 training? There seems no easy way to adapt sd3 for human generation

Reproduction

Has described in bug part

Logs

No response

System Info

V100 24GPU, batchsize 2 for each card, 3 million human data with aesthetic score > 4.5

Who can help?

No response

DN6 commented 3 days ago

Hi @KaiWU5 I think this question would be better to ask in the Discussions section.

mliand commented 3 days ago

You can show me your loss training

heart-du commented 2 days ago

I have the same question.