issue about the attention block

zoubohao / DenoisingDiffusionProbabilityModel-ddpm-

This may be the simplest implement of DDPM. You can directly run Main.py to train the UNet on CIFAR-10 dataset and see the amazing process of denoising.

MIT License

1.48k stars 156 forks source link

issue about the attention block #26

Open xuanxh1 opened 1 year ago

xuanxh1 commented 1 year ago

in attention paper, W = QK^T, right? However, in this implementation. W = Q^TW. is there something wrong?

the image above is the original code in this implementation. and the second one is the correct way I thought.