lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
MIT License
5.55k stars 643 forks source link

Train with taming's VQGAN #328

Open WocMaker opened 3 years ago

WocMaker commented 3 years ago

Hi, I was training on my own data set with taming's VQ-GAN. There's some error with the dimension with the vae model image image Can anyone help me with it? How can I change this part to let it work?

lucidrains commented 3 years ago

@rom1504 could this be related to your recent PR?

WocMaker commented 3 years ago

@rom1504 could this be related to your recent PR? Hi lucidrain, I had checked rom's dalle server respo before. I still have the PR. Do I need to modifiy more, besides adding the args --taming to use VAGAN as vae? image

WocMaker commented 3 years ago

Since when I use open AI's vae with my own dataset, the loss is not decreasing. image

rom1504 commented 3 years ago

@lucidrains maybe. I will check today (and fix if there's indeed a regression)

@WocMaker no taming option is indeed all that is needed.

The results with openAI vae are indeed as expected (and as I measured in my own experiments). It's about 4 to 8 times slower to converge when using openAI vae (mostly due to the 1024 seq length).

WocMaker commented 3 years ago

@rom1504 Thanks, Romain. Maybe there's some problem with my dataset. It's werid that my dataset can run with openAI vae, but it result in dim error for VQ-GAN. May I just ask for double check that, the dataset format needed is the same with these two vae setting up? orz

rom1504 commented 3 years ago

hi @WocMaker , I don't think there is a problem with your dataset. There was a recently introduced bug in Dalle-pytorch code, that I fixed in https://github.com/lucidrains/DALLE-pytorch/pull/329 You can directly use this change or wait for it to be merged

lucidrains commented 3 years ago

@rom1504 thank you! merged 🙏

WocMaker commented 3 years ago

@rom1504 @lucidrains Thank you so much!!!