owenliang / mnist-dits

Diffusion Transformers (DiTs) trained on MNIST dataset
55 stars 11 forks source link

多头注意力 #2

Open Walterkd opened 6 months ago

Walterkd commented 6 months ago

README里说“DiT Block采用3头注意力”,这里应该是4头注意力吧? train.py 里给的 head=4