Attention masks are missing in SD3 to mask out text padding tokens

reminisce commented 1 week ago

Describe the bug

In the attention implementation of SD3, attention masks currently are not used. This will result in inconsistent outputs for the different values max_seq_length where padding exists in text tokens as the attention scores of padding tokens are non-zero. This issue has been discussed in https://github.com/huggingface/diffusers/discussions/8628, and is created to track the progress of fixing this problem.

Thanks @sayakpaul for the discussion.

Reproduction

n/a

Logs

No response

System Info

n/a

Who can help?

No response

rootonchair commented 1 week ago

Hi @sayakpaul, I am interested in working on this issue

sayakpaul commented 1 week ago

Thanks for your interest! Sure, let’s go.

huggingface / diffusers