Framework/glue code plan

m-toman commented 5 years ago

I'm currently reworking the general training procedure to more easily enable

[x] use your own dataset (only have to provide a metadata.csv in the correct format)
[x] pick a pretrained model (currently providing automatic download of LJ pretrained models)
[x] have the whole training inside an external experiment folder
[ ] later on swap out the implementations of Tacotron and the neural vocoder

Yeongtae commented 5 years ago

@m-toman Could you tell me the number of generated samples per sec of your wavernn model. In addition, your GPU device.

In my case with my forked branch from your old repository, a V100 machine generate 1200 samples/sec. a k80 machine generate 1000 samples/sec.

m-toman commented 5 years ago

That is still true, as I haven't really finished any new integration yet. I'm currently integrating this fork of the fork of the fatchord model: https://github.com/geneing/WaveRNN-Pytorch which should be a bit faster when using batch synthesis.

Did you have any luck with WaveGlow?

Yeongtae commented 5 years ago

@m-toman nvidia-tacotron2 and nvidia-waveglow are well optimized. In my experment, a v100 machine can generate 160k samples/sec, 350k samples/sec each.

But waveglow have a problem about reverb. I'm trying to overcome this problem. https://github.com/Yeongtae/tacotron2 https://github.com/Yeongtae/waveglow

m-toman commented 5 years ago

Impressive, I also wanted to take a look at their repo but can't jump between them all the time ;). I've seen in the WaveGlow issues that the training requires lots of memory to achieve a reasonable batch size.

Yeongtae commented 5 years ago

So I'm using 8 v100 gpus to train waveglow.

Yeongtae commented 5 years ago

@m-toman do you make some results with this repository? Could you share a sample audio?

m-toman commented 5 years ago

@Yeongtae this is the current state for LJ, main annoyance are those clipping issues. Just more training doesn't seem to help. samples.zip This is trained from GTA mel specs with the settings in https://github.com/m-toman/WaveRNN-Pytorch/blob/master/hyperparams.py and https://github.com/m-toman/Tacotron-2/blob/master/hparams.py

ZohaibAhmed commented 5 years ago

@m-toman - you mentioned clipping issues and that's something I'm facing as well. Were you able to track what causes clipping to occur?

m-toman commented 5 years ago

@ZohaibAhmed I was able to fix most issues by not using the noam learning rate scheduler but set it to fixed and manually lower it when the loss starts to act funny. I also found that the simple model in https://github.com/h-meru/Tacotron-WaveRNN behaves much more benign and trains nicer (on 10-bit quantization with "bits", "mulaw" also seems to act up) than the alternative WaveRNN model by fatchord.

ZohaibAhmed commented 5 years ago

@m-toman Thanks. I see that in your referred repo, for Tacotron they use Narrow Exponential Decay, and for WaveRNN he sets the learning rate to a fixed number. Which learning rate were you referring to when you say "manually lowering it".

#################################################################
        # Narrow Exponential Decay:

        # Phase 1: lr = 1e-3
        # We only start learning rate decay after 50k steps

        # Phase 2: lr in ]1e-5, 1e-3[
        # decay reach minimal value at step 310k

        # Phase 3: lr = 1e-5
        # clip by minimal learning rate value (step > 310k)
        #################################################################

m-toman commented 5 years ago

WaveRNN - atm I'm starting out with 1e-4 and once the loss starts to act funny, stop it and divide the LR by 10. Currently training with MoL and there it worked well until 400k steps of batch size 128 until I had to lower it. Perhaps "reduce on plateau" or similar would also be a good idea.

m-toman / tacorn

Framework/glue code plan #13