lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
MIT License
5.54k stars 644 forks source link

DALLE generating ugly image #384

Open SeungyounShin opened 2 years ago

SeungyounShin commented 2 years ago

This is result from

`vae = DiscreteVAE( image_size = 128, num_layers = 2, # number of downsamples - ex. 256 / (2 ** 3) = (32 x 32 feature map) num_tokens = 8192, # number of visual tokens.\ codebook_dim = 512, # codebook dimension hidden_dim = 64, # hidden dimension num_resnet_blocks = 1, # number of resnet blocks temperature = 0.9, # gumbel softmax temperature, the lower this is, the harder the discretization straight_through = False # straight-through for gumbel softmax. unclear if it is better one way or the other )

dalle = DALLE( dim = 512, vae = vae, # automatically infer (1) image sequence length and (2) number of image tokens num_text_tokens = len(tokenizer), # vocab size for text text_seq_len = TEXT_LEN, # text sequence length depth = 6, # should aim to be 64 heads = 8, # attention heads dim_head = 64, # attention head dimension attn_dropout = 0.1, # attention dropout ff_dropout = 0.1 # feedforward dropout ).to(DEVICE)`

at 2epoch

loss decrease from 6 to 3.5.

Do I have to run more epochs to get a reasonable image?

caronstreet

my training code is here

rom1504 commented 2 years ago

you better use vqgan and an easier dataset eg https://github.com/rom1504/kaggle-fashion-dalle that's usually a faster way to get good results at low compute