[Add] paper 'Taming Transformers for High-Resolution Image Synthesis' to 'Diffusion Transformer'

mini-sora / minisora

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

https://github.com/mini-sora/minisora

Apache License 2.0

1.1k stars 144 forks source link

[Add] paper 'Taming Transformers for High-Resolution Image Synthesis' to 'Diffusion Transformer' #255

Closed chg0901 closed 3 months ago

chg0901 commented 3 months ago

Add paper 'Taming Transformers for High-Resolution Image Synthesis' to 'Diffusion Transformer'

CVPR 21 paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Esser_Taming_Transformers_for_High-Resolution_Image_Synthesis_CVPR_2021_paper.pdf

Github: https://github.com/CompVis/taming-transformers

Project: https://compvis.github.io/taming-transformers/

seifer08ms commented 3 months ago

The approach outlined in this article seems to diverge from the typical diffusion models and diffusion Transformers. It appears to be an architecture that integrates CNNs with Transformers. As it's not specifically geared towards video generation, categorizing it might be a bit tricky.

chg0901 commented 3 months ago

The approach outlined in this article seems to diverge from the typical diffusion models and diffusion Transformers. It appears to be an architecture that integrates CNNs with Transformers. As it's not specifically geared towards video generation, categorizing it might be a bit tricky.

right, the main structure is Unet or AE,

chg0901 commented 3 months ago

Is there other good works that use CNN only without Transformer?

chg0901 commented 3 months ago

This issue is solved by #345 and we add a section called "diffusion UNet"