yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
5k stars 423 forks source link

When start firtst_train give errors. I have 96 Gb Ram and 3 P40/24GB/ 1 T4 /16GB/ ?? #178

Closed casic closed 8 months ago

casic commented 11 months ago

LJSpeech-1.1/wavs/LJ043-0101.wav 22050 LJSpeech-1.1/wavs/LJ028-0188.wav 22050 LJSpeech-1.1/wavs/LJ018-0108.wav 22050 LJSpeech-1.1/wavs/LJ032-0211.wav 22050 LJSpeech-1.1/wavs/LJ034-0082.wav 22050 LJSpeech-1.1/wavs/LJ016-0002.wav 22050 LJSpeech-1.1/wavs/LJ013-0165.wav 22050 LJSpeech-1.1/wavs/LJ046-0247.wav 22050 LJSpeech-1.1/wavs/LJ017-0130.wav 22050 LJSpeech-1.1/wavs/LJ013-0176.wav 22050 LJSpeech-1.1/wavs/LJ042-0162.wav 22050 LJSpeech-1.1/wavs/LJ029-0201.wav 22050 LJSpeech-1.1/wavs/LJ016-0139.wav 22050 LJSpeech-1.1/wavs/LJ017-0258.wav 22050 LJSpeech-1.1/wavs/LJ004-0135.wav 22050 LJSpeech-1.1/wavs/LJ016-0149.wav 22050 LJSpeech-1.1/wavs/LJ024-0108.wav 22050 LJSpeech-1.1/wavs/LJ007-0078.wav 22050 LJSpeech-1.1/wavs/LJ014-0157.wav 22050 LJSpeech-1.1/wavs/LJ047-0208.wav 22050 LJSpeech-1.1/wavs/LJ013-0240.wav 22050 LJSpeech-1.1/wavs/LJ028-0059.wav 22050 Traceback (most recent call last): File "/home/koce/StyleTTS/train_first.py", line 393, in main() File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) Traceback (most recent call last): File "/home/koce/StyleTTS/train_first.py", line 214, in main File "/home/koce/StyleTTS/train_first.py", line 393, in loss_reg = r1_reg(out, gt) File "/home/koce/StyleTTS/utils.py", line 60, in r1_reg grad_dout = torch.autograd.grad( File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 394, in grad main() File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in call result = Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 78.00 MiB. GPU 0 has a total capacty of 23.87 GiB of which 25.62 MiB is free. Process 1178071 has 7.50 GiB memory in use. Process 1178069 has 10.88 GiB memory in use. Process 1178070 has 5.46 GiB memory in use. Of the allocated memory 10.44 GiB is allocated by PyTorch, and 249.36 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF return self.main(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/koce/StyleTTS/train_first.py", line 207, in main mel_rec = model.decoder(en, F0_real, real_norm, s) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/koce/StyleTTS/models.py", line 461, in forward x = block(x, s) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/koce/StyleTTS/models.py", line 407, in forward out = self._residual(x, s) File "/home/koce/StyleTTS/models.py", line 403, in _residual x = self.conv2(self.dropout(x)) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1557, in _call_impl args_result = hook(self, args) File "/usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py", line 67, in call setattr(module, self.name, self.compute_weight(module)) File "/usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py", line 26, in compute_weight return _weight_norm(v, g, self.dim) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 23.87 GiB of which 9.62 MiB is free. Process 1178071 has 7.50 GiB memory in use. Process 1178069 has 10.89 GiB memory in use. Process 1178070 has 5.46 GiB memory in use. Of the allocated memory 7.08 GiB is allocated by PyTorch, and 254.02 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

martinambrus commented 11 months ago

You're running out of memory, quite possibly because of your config settings.

On a T4 GPU, you won't be able to train much, since it has a very limited VRAM capacity (16GB). The best I could do with a T4 on Google Colab was to fine-tune the LJSpeech model with my own set of 1 - 1.25 seconds long WAV files and settings batch_size to 2 and max_len to 100 in config.yml

You can check out the https://github.com/yl4579/StyleTTS2/blob/main/Colab/StyleTTS2_Finetune_Demo.ipynb Colab notebook for an example of how to fine-tune on a T4, or the other Colab notebooks ( https://github.com/yl4579/StyleTTS2/tree/main/Colab ) for an inspiration there.

casic commented 11 months ago

Thanks.
But is there a way to fine tune on all GPUS ??? Thanks in advance!

martinambrus commented 11 months ago

No, this is not yet possible due to issue #7 and because fine-tuning script is built on top of phase 2 training script which suffers this issue. The best you can get is accelerated fine-tuning on a single processor which is marginally faster and uses a little bit less memory, but it's not a lot: accelerate launch --mixed_precision=fp16 --num_processes=1 train_finetune_accelerate.py --config_path ./Configs/config_ft.yml

casic commented 11 months ago

Thanks. One more , I have started first stage and is running about 10 hours , then terminal break process. I see it was stopped on 7/200 . Is there a way to continue after stop, not start from zero every time ???

martinambrus commented 11 months ago

Yes, the training process saves checkpoint files - as many of them as you set in config via the save_freq option. It defaults to 2, so it saves files such as StyleTTS2/Models/LJSpeech/epoch_1st_00002.pth and you can use those to continue from where you finished.

To do so, just set the pretrained_model config option to a full path of the last checkpoint file (e.g. /home/user/StyleTTS2/Models/LJSpeech/epoch_1st_00007.pth), second_stage_load_pretrained parameter to false and load_only_params option to false. Then start training and it should pick up from the given checkpoint file.

If you wanted to resume 2nd stage training, you'll need to provide 2nd stage checkpoint file and set second_stage_load_pretrained parameter to true.

casic commented 11 months ago

Thank you very much Martin. Have i nice YEAR.

yl4579 commented 10 months ago

Sorry for the late reply. I was quite busy recently. Finetuning should use all GPUs. I have tested the finetuning script on 4 NVidia A100 and it worked perfectly well. Have you checked using nvidia-smi that all GPUs were being used when you run the script, or only one of them?

martinambrus commented 10 months ago

Sorry for the late reply. I was quite busy recently. Finetuning should use all GPUs. I have tested the finetuning script on 4 NVidia A100 and it worked perfectly well. Have you checked using nvidia-smi that all GPUs were being used when you run the script, or only one of them?

Then is the following statement from the README incorrect or did I simply misunderstand what you meant there?

The script is modified from train_second.py which uses DP, as DDP does not work for train_second.py. Please see the bold section above if you are willing to help with this problem. ... If you are using a single GPU (because the script doesn't work with DDP) and want to save training speed and VRAM, you can do (thank @korakoe for making the script at https://github.com/yl4579/StyleTTS2/pull/100): accelerate launch --mixed_precision=fp16 --num_processes=1 train_finetune_accelerate.py --config_path ./Configs/config_ft.yml

teamblubee commented 10 months ago

The fine tuning is a modified second training so it might work but I don't think so. If the author can create a post showing where the train second fails.

Once that's posted then I can take some time to examine it and see if I can debug the issue.

casic commented 10 months ago

I have started without the accelerator and it starts. Thanks for the replies !!!

На сб, 20.01.2024 г. в 18:16 blubee @.***> написа:

The fine tuning is a modified second training so it might work but I don't think so. If the author can create a post showing where the train second fails.

Once that's posted then I can take some time to examine it and see if I can debug the issue.

— Reply to this email directly, view it on GitHub https://github.com/yl4579/StyleTTS2/issues/178#issuecomment-1902165333, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYTSUJFS6IFNHU2TI7NZZDYPPUULAVCNFSM6AAAAABBIJ5EH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBSGE3DKMZTGM . You are receiving this because you authored the thread.Message ID: @.***>