Open hoyden opened 2 years ago
I refer to Appendix A (page 13) of BigVGAN paper.
The upsampled feature is followed by M number of AMP residual blocks, where each AMP block uses different kernel sizes for a stack of dilated 1D convolutions defined as ki,j (j = {1, . . . , M}). --> M =3 [3, 7, 11]
The j-th AMP block contains L number of the anti-aliased periodic activation and the dilated 1D convolution using a dilatation rate of di,j,l(l = {1, . . . , L}). --> L = 6 (3x2) [[1,1], [3,1], [5,1]]
So I used low-pass filter twice in this module.
But, I'm not sure it is same with what the author intended. When I first implemented this part, I was also confused because the figure was not matched with the hyperparameter (Specifically, dilation rates).
Thanks for your reply. But I think L = 3, it's a common practice to add an extra dilated 1D convolution layer (d = 1). Maybe we should think of two layers of dilated convolution as a whole, and just apply low-pass filter at input of the whole module. It's just my opinion ^^
Thank you for your concern about this issue.
I agree your idea about using a low-pass filter only once. However, in this case, I was confused about the activation function between these dilated convs.
The original HiFi-GAN used a leaky-relu for activation function in this part. However, in this paper, BigVGAN replaced it with snake1d. Hence, I implemented this part just by using snake and resamping with low-pass filter twice.
I just think that it would have been nice if there are additional ablation studies about this part in this paper.
I'm not sure that which implementation is the best in this cases. I hope that the authors of BigVGAN adress these issues with more ablation studies...
Yeah, you're right. I'll try to use snake1d directly between two dilated convs. And I will tell you the results of my experiment. Maybe someday NVIDIA will open source their work. ^^
After some comparison, I found that both models (using resampling once or twice) have similar performance.
But when using twice, training/inference speed is much slower so I changed it as you mentioned
Thank you~
https://github.com/sh-lee-prml/BigVGAN/blob/37e49f36e50134de45b407bf2c6b1a61cea09329/models_bigvgan.py#L66-L69
Is these necessary? I didn't find it in paper.