jik876 / hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
MIT License
1.88k stars 500 forks source link

Fine-tuning from pretrained models? #70

Open turian opened 3 years ago

turian commented 3 years ago

The pretrained models have names like: generator_v1

However, train.py looks for checkpoints with the following code:


    if os.path.isdir(a.checkpoint_path):
        cp_g = scan_checkpoint(a.checkpoint_path, 'g_')
        cp_do = scan_checkpoint(a.checkpoint_path, 'do_')

where are pretrained checkpoints we can use for fine-tuning? Or can you clarify how to invoke the script for finetuning?

EmreOzkose commented 3 years ago

I think all models expect universal one are shared to make a demo. In this case, fine-tune part is redundant. Otherwise, we should change your shared code piece and load generator and start from epoch 0 with pre-trained generator.

turian commented 3 years ago

Could you explain what you mean?

If I am using UNIVERSAL to fine-tune a particular use-case, should I change last_epoch to 0 in train.py? I'm concerned that I'm starting with a learning_rate that has already had a very large ExponentialLR lr_decay already applied to it, and that maybe I should start with the original learning rate?

jmasterx commented 3 years ago

https://drive.google.com/drive/folders/1YuOoV3lO2-Hhn1F2HJ2aQ4S0LC1JdKLd

turian commented 3 years ago

@jmasterx yes, I have seen the universal model. If you are fine-tuning from the universal model using the existing config, since it already has done many epochs, does the fine-tuning start with a very low LR? Is the ExponentialLR lr_decay applied? Should I change last_epoch to 0 in train.py so that fine-tuning starts with the initial LR? Or do I want to fine tune from a very small LR that has been exponentially decayed?

turian commented 3 years ago

My question more is about details of how fine-tuning is applied. Does it typically use the current LR at this epoch? Or is LR reset? I can't find any written info about the details of how fine-tuning is applied in practice.

jmasterx commented 3 years ago

The schedulers are created like this:

    scheduler_g = torch.optim.lr_scheduler.ExponentialLR(optim_g, gamma=h.lr_decay, last_epoch=last_epoch)
    scheduler_d = torch.optim.lr_scheduler.ExponentialLR(optim_d, gamma=h.lr_decay, last_epoch=last_epoch)

and the epoch info is loaded from the state dict: last_epoch = state_dict_do['epoch']

So it would make sense to me that it will pick back up where it left off with the decayed LR. And if you continue training it with a new dataset, I would think the LR decay would still apply so it would learn a lot slower than step 0.

If you want to train it from scratch but starting with the Universal weights, then probably clear the steps and last_epoch? (don't load them)

turian commented 3 years ago

@jmasterx thanks, that was my thought too. Any idea what was done by the authors when they released the fine-tuned models? @jik876 ?

EmreOzkose commented 3 years ago

Could you explain what you mean?

If I am using UNIVERSAL to fine-tune a particular use-case, should I change last_epoch to 0 in train.py? I'm concerned that I'm starting with a learning_rate that has already had a very large ExponentialLR lr_decay already applied to it, and that maybe I should start with the original learning rate?

I fine-tuned UNIVERSAL and it is not required to set last-epoch=0, it automatically continues. The universal one is already okey to use. Hovewer I don't understand other models. I load generator and trained a model, but it seems it starts from zero, not like fine-tunning

thepowerfuldeez commented 3 years ago

@EmreOzkose i guess that's because author shared discriminator only for universal model

turian commented 3 years ago

@thepowerfuldeez The universal model has both generator and discriminator?

jmasterx commented 3 years ago

Yes the link I provided contains both the do file (discriminator weights and optimizer state) and g file, (generator weights)

turian commented 3 years ago

@jmasterx When fine-tuning on an existing model, like tacotron2, did the weights and weight schedule start again at 0 or at the epoch the fine-tuning started at?

Thank you for the link

caixxiong commented 3 years ago

I think all models expect universal one are shared to make a demo. In this case, fine-tune part is redundant. Otherwise, we should change your shared code piece and load generator and start from epoch 0 with pre-trained generator.

It seems that fine-tuning is just preformed for end2end TTS synthesis, i.e., it fine-tunings the model with the synthesized mel-spectrogram from the TTS acoustic model, such as Tacotron2.

tylerweitzman commented 2 years ago

Is it possible to fine-tune VCTK_V2 using the discriminator from UNIVERSAL_V1?