[SD3 ControlNet] bug in pipeline 'controlnet_pooled_projections'

tobiasfshr commented 1 month ago

Describe the bug

Hi,

I think I found an issue that causes a misalignment between training and inference in SD3 ControlNet.

https://github.com/huggingface/diffusers/blob/a3e8d3f7deed140f57a28d82dd0b5d965bd0fb09/src/diffusers/pipelines/controlnet_sd3/pipeline_stable_diffusion_3_controlnet.py#L977

I think the if-else block starting there is not correct. It should be

        if controlnet_pooled_projections is None and pooled_prompt_embeds is None:
            controlnet_pooled_projections = torch.zeros_like(pooled_prompt_embeds)
        elif controlnet_pooled_projections is None:
            controlnet_pooled_projections = pooled_prompt_embeds

Given that in training, the pooled_prompt_embeds are fed to the model: https://github.com/huggingface/diffusers/blob/a3e8d3f7deed140f57a28d82dd0b5d965bd0fb09/examples/controlnet/train_controlnet_sd3.py#L1293

Additionally, I am wondering if this line: https://github.com/huggingface/diffusers/blob/a3e8d3f7deed140f57a28d82dd0b5d965bd0fb09/examples/controlnet/train_controlnet_sd3.py#L1287 Should be aligned with this line: https://github.com/huggingface/diffusers/blob/a3e8d3f7deed140f57a28d82dd0b5d965bd0fb09/examples/controlnet/train_controlnet_sd3.py#L1257 This seems to be the more sensible approach, but will probably not make much difference since the ControlNet can also learn the shift. It might speed up convergence slightly.

Best, Tobias

Reproduction

Train an SD3 ControlNet and during log_validation it will be executed.

Logs

No response

System Info

diffusers==0.30.3

Who can help?

@yiyixuxu @sayakpaul

xduzhangjiayu commented 1 month ago

I have tried train sd3 controlnet, but it seems the validation results are really bad, and the training loss was oscillating all the time, you can take a look the results at this discussion https://github.com/huggingface/diffusers/discussions/9675

Maybe you have any suggestions to make training sd3 controlnet have better results? thank you！

egbertYeah commented 1 month ago

i also find this bug, but when i test https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting repo, controlnet_pooled_projections = torch.zeros_like(pooled_prompt_embeds) is right.

xduzhangjiayu commented 1 month ago

i also find this bug, but when i test https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting repo, controlnet_pooled_projections = torch.zeros_like(pooled_prompt_embeds) is right.

Could you please describe the bug? Maybe I have same bug like you

egbertYeah commented 1 month ago

i also find this bug, but when i test https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting repo, controlnet_pooled_projections = torch.zeros_like(pooled_prompt_embeds) is right.

Could you please describe the bug? Maybe I have same bug like you

controlnet_pooled_projections variable is different at inference and training time，when training use pooled_prompt_embeds，inference use torch.zeros_like(pooled_prompt_embeds)

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul commented 1 week ago

Cc: @yiyixuxu

tobiasfshr commented 1 week ago

@sayakpaul @yiyixuxu I can make a PR if you approve of the changes (if clause is a clear bug, but the missing shift is debatable).

sayakpaul commented 1 week ago

Sorry for the delay on our end, @tobiasfshr. The team was on a company-wide vacation. Yiyi will respond to your queries soon.

huggingface / diffusers