Scalable Diffusion Models with Transformers [ICCV'23]

Screenshot 2024-02-20 at 14 19 58

Abst

diffusion models based on transformer architecture, which has good scalability properties
explore 4 ways (In-context, Cross-attention, AdaLN, adaLN-Zero) to add conditional information

Screenshot 2024-02-20 at 14 23 16

In-context Conditioning
- append the vector embeddings of t and c
Cross-attention
- concatenate t and c into a length-two sequence, and use additional multi-head cross-attention layer(Gflops 15% overhead)
Adaptive layer norm(adaLN) block
- replace standard layer norm layers in transformer blacks with adaptive layer norm
adaLN-Zero
- initialize each residual block as the identity function

Screenshot 2024-02-20 at 14 31 55

Screenshot 2024-02-20 at 14 32 49