`vae = DiscreteVAE(
image_size = 128,
num_layers = 2, # number of downsamples - ex. 256 / (2 ** 3) = (32 x 32 feature map)
num_tokens = 8192, # number of visual tokens.\
codebook_dim = 512, # codebook dimension
hidden_dim = 64, # hidden dimension
num_resnet_blocks = 1, # number of resnet blocks
temperature = 0.9, # gumbel softmax temperature, the lower this is, the harder the discretization
straight_through = False # straight-through for gumbel softmax. unclear if it is better one way or the other
)
dalle = DALLE(
dim = 512,
vae = vae, # automatically infer (1) image sequence length and (2) number of image tokens
num_text_tokens = len(tokenizer), # vocab size for text
text_seq_len = TEXT_LEN, # text sequence length
depth = 6, # should aim to be 64
heads = 8, # attention heads
dim_head = 64, # attention head dimension
attn_dropout = 0.1, # attention dropout
ff_dropout = 0.1 # feedforward dropout
).to(DEVICE)`
at 2epoch
loss decrease from 6 to 3.5.
Do I have to run more epochs to get a reasonable image?
This is result from
`vae = DiscreteVAE( image_size = 128, num_layers = 2, # number of downsamples - ex. 256 / (2 ** 3) = (32 x 32 feature map) num_tokens = 8192, # number of visual tokens.\ codebook_dim = 512, # codebook dimension hidden_dim = 64, # hidden dimension num_resnet_blocks = 1, # number of resnet blocks temperature = 0.9, # gumbel softmax temperature, the lower this is, the harder the discretization straight_through = False # straight-through for gumbel softmax. unclear if it is better one way or the other )
dalle = DALLE( dim = 512, vae = vae, # automatically infer (1) image sequence length and (2) number of image tokens num_text_tokens = len(tokenizer), # vocab size for text text_seq_len = TEXT_LEN, # text sequence length depth = 6, # should aim to be 64 heads = 8, # attention heads dim_head = 64, # attention head dimension attn_dropout = 0.1, # attention dropout ff_dropout = 0.1 # feedforward dropout ).to(DEVICE)`
at 2epoch
loss decrease from 6 to 3.5.
Do I have to run more epochs to get a reasonable image?
my training code is here