Any tips for stage 2 training

1Konny commented 5 years ago

Hi,

I've been thinking to utilize VQ-VAE 2 in my project and found that you're already done this great thing! thanks for your implementation.

I've just finished stage 1, followed by code extraction step, and it shows sound reconstructions in my own data. But when I turn to stage 2, it seems that pixelSNAILs fail to model the extracted marginal distributions of the codes(top code accuracy ~ 0.4, bottom code accuracy ~ 0.3).

Now I'm thinking to increase the capacity of the pixelSNAILs but I'm not sure which arguments of PixelSNAIL module I should deal with. (e.g. n_block, n_res_block, res_channel, n_cond_res_block, ...)

Do you have any ideas or suggestions on this?

rosinality commented 5 years ago

Some PixelSNAIL arguments:

channel: number of channels of each blocks
n_block: number of blocks (each block consists of n residual blocks and 1 self attention)
n_res_block: number of residual blocks in each blocks
res_channel: number of inner channels of convolution in residual block.
n_cond_res_block: number of residual blocks for conditioning (for example, bottom model conditions on top codes using cond_res_block)
cond_res_channel: number of inner channels of convolution in cond_res_block
n_out_res_block: number of 1x1 residual blocks after pixelsnail blocks

I don't know the architecture in the paper is similar to the implementation in this repository, but I think you can use the hyperparameters in the paper. For example, you can increase channel to 512 and res_channel to 512 or more.

Also, you can try decreasing learning rates after model is somewhat converged.

1Konny commented 5 years ago

Thanks! I'll try it.

rosinality / vq-vae-2-pytorch

Any tips for stage 2 training #5