Open tobiasfshr opened 1 month ago
I have tried train sd3 controlnet, but it seems the validation results are really bad, and the training loss was oscillating all the time, you can take a look the results at this discussion https://github.com/huggingface/diffusers/discussions/9675
Maybe you have any suggestions to make training sd3 controlnet have better results? thank you!
i also find this bug, but when i test https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting repo, controlnet_pooled_projections = torch.zeros_like(pooled_prompt_embeds) is right.
i also find this bug, but when i test https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting repo, controlnet_pooled_projections = torch.zeros_like(pooled_prompt_embeds) is right.
Could you please describe the bug? Maybe I have same bug like you
i also find this bug, but when i test https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting repo, controlnet_pooled_projections = torch.zeros_like(pooled_prompt_embeds) is right.
Could you please describe the bug? Maybe I have same bug like you
controlnet_pooled_projections variable is different at inference and training time,when training use pooled_prompt_embeds
,inference use torch.zeros_like(pooled_prompt_embeds)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Cc: @yiyixuxu
@sayakpaul @yiyixuxu I can make a PR if you approve of the changes (if clause is a clear bug, but the missing shift is debatable).
Sorry for the delay on our end, @tobiasfshr. The team was on a company-wide vacation. Yiyi will respond to your queries soon.
Describe the bug
Hi,
I think I found an issue that causes a misalignment between training and inference in SD3 ControlNet.
https://github.com/huggingface/diffusers/blob/a3e8d3f7deed140f57a28d82dd0b5d965bd0fb09/src/diffusers/pipelines/controlnet_sd3/pipeline_stable_diffusion_3_controlnet.py#L977
I think the if-else block starting there is not correct. It should be
Given that in training, the pooled_prompt_embeds are fed to the model: https://github.com/huggingface/diffusers/blob/a3e8d3f7deed140f57a28d82dd0b5d965bd0fb09/examples/controlnet/train_controlnet_sd3.py#L1293
Additionally, I am wondering if this line: https://github.com/huggingface/diffusers/blob/a3e8d3f7deed140f57a28d82dd0b5d965bd0fb09/examples/controlnet/train_controlnet_sd3.py#L1287 Should be aligned with this line: https://github.com/huggingface/diffusers/blob/a3e8d3f7deed140f57a28d82dd0b5d965bd0fb09/examples/controlnet/train_controlnet_sd3.py#L1257 This seems to be the more sensible approach, but will probably not make much difference since the ControlNet can also learn the shift. It might speed up convergence slightly.
Best, Tobias
Reproduction
Train an SD3 ControlNet and during log_validation it will be executed.
Logs
No response
System Info
diffusers==0.30.3
Who can help?
@yiyixuxu @sayakpaul