Open jordiae opened 1 year ago
@jordiae i think SOTA for diffusion transformers would be Muse
i'll take a look at DiVAE this weekend, thanks!
@jordiae i think SOTA for diffusion transformers would be Muse
i'll take a look at DiVAE this weekend, thanks!
The main difference is that in DiVAE the decoder of the image "tokenizer" is a diffusion model. Thanks!
Edit: This should be better than VQGAN (see Table 1 in https://arxiv.org/pdf/2206.00386.pdf)
@jordiae i think SOTA for diffusion transformers would be Muse i'll take a look at DiVAE this weekend, thanks!
The main difference is that in DiVAE the decoder of the image "tokenizer" is a diffusion model. Thanks!
Edit: This should be better than VQGAN (see Table 1 in https://arxiv.org/pdf/2206.00386.pdf)
oh my, it is like a frankenstein haha
DiVAE [1] uses a VQ encoder and a diffusion decoder. Unfortunately, there's no public implementation. It would also be nice to combine that with diffusion Transformers [2].
Any way many thanks for all your work!
[1] https://arxiv.org/abs/2206.00386 [2] https://arxiv.org/abs/2212.09748