rosinality / vq-vae-2-pytorch

Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch
Other
1.6k stars 270 forks source link

Any tips for stage 2 training #5

Closed 1Konny closed 5 years ago

1Konny commented 5 years ago

Hi,

I've been thinking to utilize VQ-VAE 2 in my project and found that you're already done this great thing! thanks for your implementation.

I've just finished stage 1, followed by code extraction step, and it shows sound reconstructions in my own data. But when I turn to stage 2, it seems that pixelSNAILs fail to model the extracted marginal distributions of the codes(top code accuracy ~ 0.4, bottom code accuracy ~ 0.3).

Now I'm thinking to increase the capacity of the pixelSNAILs but I'm not sure which arguments of PixelSNAIL module I should deal with. (e.g. n_block, n_res_block, res_channel, n_cond_res_block, ...)

Do you have any ideas or suggestions on this?

rosinality commented 5 years ago

Some PixelSNAIL arguments:

I don't know the architecture in the paper is similar to the implementation in this repository, but I think you can use the hyperparameters in the paper. For example, you can increase channel to 512 and res_channel to 512 or more.

Also, you can try decreasing learning rates after model is somewhat converged.

1Konny commented 5 years ago

Thanks! I'll try it.