issues
search
tatsuropfgt
/
papers
read paper memo
0
stars
0
forks
source link
Scalable Diffusion Models with Transformers
#77
Open
tatsuropfgt
opened
9 months ago
tatsuropfgt
commented
9 months ago
Scalable Diffusion Models with Transformers [ICCV'23]
Abst
diffusion models based on transformer architecture, which has good scalability properties
explore 4 ways (In-context, Cross-attention, AdaLN, adaLN-Zero) to add conditional information
Method
Patchify
convert spatial input into a sequence of T tokens, each dimension d
apply frequency-based positional embeddings (the sine-cosine version)
smaller patch size => bigger T
To add conditional information
In-context Conditioning
append the vector embeddings of t and c
Cross-attention
concatenate t and c into a length-two sequence, and use additional multi-head cross-attention layer(Gflops 15% overhead)
Adaptive layer norm(adaLN) block
replace standard layer norm layers in transformer blacks with adaptive layer norm
adaLN-Zero
initialize each residual block as the identity function
Result
inherit scaling properties of the transformer model
Memo
this architecture is used in Sora(OpenAI)
Scalable Diffusion Models with Transformers [ICCV'23]
Abst
Method
Patchify
To add conditional information
Result
Memo