Open m-toman opened 5 years ago
@m-toman Could you tell me the number of generated samples per sec of your wavernn model. In addition, your GPU device.
In my case with my forked branch from your old repository, a V100 machine generate 1200 samples/sec. a k80 machine generate 1000 samples/sec.
That is still true, as I haven't really finished any new integration yet. I'm currently integrating this fork of the fork of the fatchord model: https://github.com/geneing/WaveRNN-Pytorch which should be a bit faster when using batch synthesis.
Did you have any luck with WaveGlow?
@m-toman nvidia-tacotron2 and nvidia-waveglow are well optimized. In my experment, a v100 machine can generate 160k samples/sec, 350k samples/sec each.
But waveglow have a problem about reverb. I'm trying to overcome this problem. https://github.com/Yeongtae/tacotron2 https://github.com/Yeongtae/waveglow
Impressive, I also wanted to take a look at their repo but can't jump between them all the time ;). I've seen in the WaveGlow issues that the training requires lots of memory to achieve a reasonable batch size.
So I'm using 8 v100 gpus to train waveglow.
@m-toman do you make some results with this repository? Could you share a sample audio?
@Yeongtae this is the current state for LJ, main annoyance are those clipping issues. Just more training doesn't seem to help. samples.zip This is trained from GTA mel specs with the settings in https://github.com/m-toman/WaveRNN-Pytorch/blob/master/hyperparams.py and https://github.com/m-toman/Tacotron-2/blob/master/hparams.py
@m-toman - you mentioned clipping issues and that's something I'm facing as well. Were you able to track what causes clipping to occur?
@ZohaibAhmed I was able to fix most issues by not using the noam learning rate scheduler but set it to fixed and manually lower it when the loss starts to act funny. I also found that the simple model in https://github.com/h-meru/Tacotron-WaveRNN behaves much more benign and trains nicer (on 10-bit quantization with "bits", "mulaw" also seems to act up) than the alternative WaveRNN model by fatchord.
@m-toman Thanks. I see that in your referred repo, for Tacotron they use Narrow Exponential Decay, and for WaveRNN he sets the learning rate to a fixed number. Which learning rate were you referring to when you say "manually lowering it".
#################################################################
# Narrow Exponential Decay:
# Phase 1: lr = 1e-3
# We only start learning rate decay after 50k steps
# Phase 2: lr in ]1e-5, 1e-3[
# decay reach minimal value at step 310k
# Phase 3: lr = 1e-5
# clip by minimal learning rate value (step > 310k)
#################################################################
WaveRNN - atm I'm starting out with 1e-4 and once the loss starts to act funny, stop it and divide the LR by 10. Currently training with MoL and there it worked well until 400k steps of batch size 128 until I had to lower it. Perhaps "reduce on plateau" or similar would also be a good idea.
I'm currently reworking the general training procedure to more easily enable