Base parameters for replication of decoder

This run has verified distributed training for the first unet of the decoder and I would like to set out some standard parameters that can be experimented with going forward.

The above run uses embeddings from vit-l-14 and has unet parameters: dim=512 dim_mults=(1, 2, 3, 4) attn_dim_head = 32 attn_heads = 16 resnet_groups = 8 num_resnet_blocks = 2 init_cross_embed_kernel_sizes = (3, 7, 15)

I have no reference point for what could be changed to improve performance. If anyone has magic knowledge of what hyperparameter values might improve performance it would really go a long way to making the guesswork process quicker.

lucidrains / DALLE2-pytorch

Base parameters for replication of decoder #137