Open bendanzzc opened 6 days ago
Good observation! Thank you for bringing this up!
Yeah, ideally, a reversal of the following would be needed: https://github.com/huggingface/diffusers/blob/0f0b531827900d805f8d2d0a42c1040a1e34bf07/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py#L893
Do you want to give it a try and open a PR, perhaps? Happy to help you through the process.
Thanks, I'd like to try
Lovely. Thanks so much.
I've just implemented this in my own training code which is largely based on the diffusers example, and it does seem to noticeably help with image crispness in some with/without tests on the same training data (though non-deterministic choice of seed, image ordering, prompt shuffling, etc).
Do you wanna show some comparisons?
I only kept one image sorry. I tried training with the character Ahsoka as the toughest example in my dataset.
The left is training without handling shift, right is with handling shift, on approximately the same prompt (with some shuffling) at about the same number of steps. Without applying shift they all looked blurry like the left sample (after a few epochs), whereas with shift there was a mix of blurry and crisp previews, so it seemed to be helping. The samples were always generated with shift from the start as they use different code.
This was full finetuning, rather than LoRA training.
Describe the bug
shift_factor missing in traning code: https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora_sd3.py#L1617, but used in inference code: https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py#L893 Is it resonable that when traning SD3, we do not need to norm latents using vae.config.shift_factor and scale_factor?
Thinks
Reproduction
None
Logs
No response
System Info
None
Who can help?
@sayakpaul