huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
23.96k stars 4.93k forks source link

Attention masks are missing in SD3 to mask out text padding tokens #8673

Open reminisce opened 1 week ago

reminisce commented 1 week ago

Describe the bug

In the attention implementation of SD3, attention masks currently are not used. This will result in inconsistent outputs for the different values max_seq_length where padding exists in text tokens as the attention scores of padding tokens are non-zero. This issue has been discussed in https://github.com/huggingface/diffusers/discussions/8628, and is created to track the progress of fixing this problem.

Thanks @sayakpaul for the discussion.

Reproduction

n/a

Logs

No response

System Info

n/a

Who can help?

No response

rootonchair commented 1 week ago

Hi @sayakpaul, I am interested in working on this issue

sayakpaul commented 1 week ago

Thanks for your interest! Sure, let’s go.