Need help with decoder training

I'm training on the CC3M. Is there anything wrong with my training? The loss seems to be going down way too fast and despite the low training loss values, sampling doesn't seem to show it is working. Sampling during training by just calling decoder.sample() giving it the CLIP image embeddings of the minibatch training images. Since I'm training a decoder with two Unets and just training the first Unet for now, I'm breaking out after sampling from the first Unet.

decoder_training_loss

Theses are the samples at the 0k, 5k, 13k, 16k, and 17k training steps.

13k 16k 17k

lucidrains / DALLE2-pytorch

Need help with decoder training #40