owenliang / mnist-dits

Diffusion Transformers (DiTs) trained on MNIST dataset
42 stars 10 forks source link

多头注意力 #2

Open Walterkd opened 4 months ago

Walterkd commented 4 months ago

README里说“DiT Block采用3头注意力”,这里应该是4头注意力吧? train.py 里给的 head=4