lucidrains / DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
MIT License
11.03k stars 1.07k forks source link

Weird thing happened on sampler #155

Closed YUHANG-Ma closed 2 years ago

YUHANG-Ma commented 2 years ago

Hi all, I met a weird thing when I use your code to train my model. When I use the sampler to generate samples of train and test data like what are shown in https://wandb.ai/veldrovive/dalle2_train_decoder/runs/jkrtg0so?workspace=user-veldrovive, I found that when we change the network or change the n_sample_data, generated pics seems to be the same. And when we generate train and test samples seperately, it seems the results of test data are the same as those of train data. We checked the network output, it seems the imbeddings are different but got the same pics in the end. image image We use your code (new version) and didn't change anything. Could you give us some suggestions about this?

BinglengTang commented 2 years ago

Wow,I have met the same problem. It seems like that the sequeneces of noise generated by "torch.rand_like()" in different program are always the same. I replace the noise function with five "noise = torch.randn_like(x)" and get the different images. Is it a bug to train the dalle2 decoder ?

lucidrains commented 2 years ago

i think this is just a consequence of proper seeding?

BinglengTang commented 2 years ago

The important thing is that the images generate with different input embeddings are always the same

lucidrains commented 2 years ago

hmm, maybe @Veldrovive Aidan would have some ideas

Veldrovive commented 2 years ago

I am not quite understanding the issue. I see that in the screenshot above some of the output images look the same given different input images which is definitely not expected behavior, but I am not seeing the full situation. It would be better if you had a wandb link so I could get the config file and see how things are changing over the course of training. If either of you, @BinglengTang or @YUHANG-Ma, have that it would make debugging a lot faster.

BinglengTang commented 2 years ago

It seems like that we combine the right images with wrong image embeddings. The model learned less information from the inputs and generate the output mainly based on the noise input!

lucidrains commented 2 years ago

@YUHANG-Ma ohh, why did you close the issue?