Closed 1Konny closed 5 years ago
Some PixelSNAIL arguments:
I don't know the architecture in the paper is similar to the implementation in this repository, but I think you can use the hyperparameters in the paper. For example, you can increase channel to 512 and res_channel to 512 or more.
Also, you can try decreasing learning rates after model is somewhat converged.
Thanks! I'll try it.
Hi,
I've been thinking to utilize VQ-VAE 2 in my project and found that you're already done this great thing! thanks for your implementation.
I've just finished stage 1, followed by code extraction step, and it shows sound reconstructions in my own data. But when I turn to stage 2, it seems that pixelSNAILs fail to model the extracted marginal distributions of the codes(top code accuracy ~ 0.4, bottom code accuracy ~ 0.3).
Now I'm thinking to increase the capacity of the pixelSNAILs but I'm not sure which arguments of PixelSNAIL module I should deal with. (e.g. n_block, n_res_block, res_channel, n_cond_res_block, ...)
Do you have any ideas or suggestions on this?