yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
4.98k stars 422 forks source link

Resuming finetuning uses second to last epoch #238

Open SimonDemarty opened 6 months ago

SimonDemarty commented 6 months ago

Hello there,

First of all, thank you for the great model! I noticed something strange while finetuning the model. Indeed, it seems that resuming finetuning actually resumes 1 epoch before the one specified.

Replicate

i finetuned the model using: accelerate launch --mixed_precision=fp16 --num_process=1 train_finetune_accelerate.py --config_path <path/to/config.yml>

So far, everything went well, until the finetuning crashed (this was to be expected with the parameters I chose). The last epoch I used before it crashed was the [11/100]:

Epoch [11/100], Step [36/685], Loss: 0.29078, Disc Loss: 3.66850, Dur Loss: 0.34166, CE Loss: 0.01789, Norm Loss: 0.43014, F0 Loss: 1.61470, LM Loss: 1.13319, Gen Loss: 7.04091, Sty Loss: 0.08961, Diff Loss: 0.47579, DiscLM Loss: 0.00000, GenLM Loss: 0.00000, SLoss: 0.00000, S2S Loss: 0.04884, Mono Loss: 0.07268
Time elasped: 85.4826250076294
Epoch [11/100], Step [37/685], Loss: 0.22887, Disc Loss: 3.83498, Dur Loss: 0.33926, CE Loss: 0.01526, Norm Loss: 0.20141, F0 Loss: 0.96320, LM Loss: 0.77488, Gen Loss: 6.66455, Sty Loss: 0.08425, Diff Loss: 0.59685, DiscLM Loss: 0.00000, GenLM Loss: 0.00000, SLoss: 0.00000, S2S Loss: 0.00954, Mono Loss: 0.11665
Time elasped: 87.4490795135498

At this point, as I am currently working on 11th epoch, the last completed one was the 10th epoch, saved as epoch_2nd_00009.pth

Then, when I went to modify the config.yml, I set the following parameters to:

pretrained_model: "path/to/epoch_2nd_00009.pth"
load_only_params: false

back in the terminal, I rerun the command: accelerate launch --mixed_precision=fp16 --num_process=1 train_finetune_accelerate.py --config_path <path/to/config.yml>

what is now displayed is:

Epoch [10/100], Step [1/685], Loss: 0.31872, Disc Loss: 3.71658, Dur Loss: 0.43845, CE Loss: 0.01963, Norm Loss: 0.27553, F0 Loss: 1.27736, LM Loss: 0.93941, Gen Loss: 7.35116, Sty Loss: 0.00000, Diff Loss: 0.00000, DiscLM Loss: 0.00000, GenLM Loss: 0.00000, SLoss: 0.00000, S2S Loss: 0.05938, Mono Loss: 0.05683
Time elasped: 2.4969496726989746

I waited a bit to see that no epoch were stored. I am assuming it stored epoch_2nd_00009.pth.

Conclusion

It means that resuming finetuning probably actually uses the right epoch (the one I linked in the config.yml), but resumes finetuning under the wrong number (ie. 10 instead of 11). Thus, it might also uses the wrong parameters (diff epoch=10 so would be used on [11/100] if I understand well)

martinambrus commented 2 months ago

I've fixed this in the following commit: https://github.com/martinambrus/StyleTTS2/commit/bf7ea7172d83db9c0e5b414fdf5e304aa3f4b848

Feel free to update those 4 files if you need this fix as well. This repo owner is currently inactive and pull requests don't seem to be reviewed anymore, so I didn't really care to create one.