When using the Feedforward in diffusers.models.attention.
I've observed discrepancies in the results when processing subsets of the original input that vary in sequence length. However, I think the Feedforward mechanism should operate independently of sequence length?
### System Info
- 🤗 Diffusers version: 0.30.0.dev0 (also can reproduce in released v0.29.0-v0.30.0)
- Platform: Linux-5.15.0-113-generic-x86_64-with-glibc2.35
- Running on a notebook?: No
- Running on Google Colab?: No
- Python version: 3.10.14
- PyTorch version (GPU?): 2.3.1+cu121 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.23.4
- Transformers version: 4.42.3
- Accelerate version: 0.32.1
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.4.3
- xFormers version: not installed
- Accelerator: Tesla V100-SXM2-16GB, 16384 MiB
Tesla V100-SXM2-16GB, 16384 MiB
Tesla V100-SXM2-16GB, 16384 MiB
Tesla V100-SXM2-16GB, 16384 MiB VRAM
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
### Who can help?
@yiyixuxu @sayakpaul @DN6 @asomoza
@DN6 @yiyixuxu @sayakpaul
Thanks for helping me!!!!
Describe the bug
When using the Feedforward in
diffusers.models.attention
. I've observed discrepancies in the results when processing subsets of the original input that vary in sequence length. However, I think the Feedforward mechanism should operate independently of sequence length?Reproduction
Logs