yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
4.98k stars 422 forks source link

Getting CUDA Out of memory error in Stage2 training #256

Closed SandyPanda-MLDL closed 2 months ago

SandyPanda-MLDL commented 5 months ago

While executing the stage2 training I am getting cuda out of memory error continuously. I am executing stage2 training code in NVIDIA L40S GPU.

File "train_second.py", line 827, in main() File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "train_second.py", line 428, in main y_rec_gt_pred = model.decoder(en, F0_real, N_real, s) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Modules/hifigan.py", line 478, in forward x = self.generator(x, s, F0_curve) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Modules/hifigan.py", line 341, in forward xs += self.resblocks[iself.num_kernels+j](x, s) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Modules/hifigan.py", line 67, in forward xt = n1(x, s) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Modules/hifigan.py", line 21, in forward h = self.fc(s) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias)

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 7; 79.15 GiB total capacity; 2.32 GiB already allocated; 3.19 MiB free; 2.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Karesto commented 5 months ago

That just means you don't have enough memory in your GPU to run this. Try reducing batch_size and max_len in config.

SandyPanda-MLDL commented 5 months ago

That just means you don't have enough memory in your GPU to run this. Try reducing batch_size and max_len in config.

But my batch size is already 2 and batch percentage is 0.5 . I am sharing my config file here:

log_dir: "/hdd2/Sandipan/SDhar-Projects/StyleTTS2/Models/New_Hindi_Speech_2nd" first_stage_path: "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Log_files/epoch_1st_00037.pth" save_freq: 2

save_freq: 2

log_interval: 10 device: "cuda"

epochs_1st: 50

epochs_1st: 200 # number of epochs for first stage training (pre-training)

epochs_2nd: 30

epochs_2nd: 100 # number of peochs for second stage training (joint training) batch_size: 2 max_len: 100

max_len: 100 # maximum number of frames

pretrained_model: "" second_stage_load_pretrained: true # set to true if the pre-trained model is for 2nd stage load_only_params: false # set to true if do not want to load epoch numbers and optimizer parameters

F0_path: "Utils/JDC/bst.t7" ASR_config: "Utils/ASR/config.yml" ASR_path: "Utils/ASR/epoch_00080.pth"

/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Utils/PLBERT_all_languages

PLBERT_dir: 'Utils/PLBERT_all_languages/'

"/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Hindi_Data_Phoneme/val_list.txt"

data_params: train_data: "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Hindi_Data_Phoneme/train.txt"
val_data: "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Hindi_Data_Phoneme/valid.txt" root_path: "/hdd2/Sandipan/database/Hindi_ASR_200/Hindi_Clean/" OOD_data: "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Hindi_Data_Phoneme/odd.txt" min_length: 50 # sample until texts with this size are obtained for OOD texts

data_params:

train_data: "Data/train_list_new.txt"

val_data: "Data/valid_list_new.txt"

root_path: "/hdd5/Sandipan/SDhar-Projects/Grad-TTS-Libri/Speech-Backbones/Grad-TTS/LJSpeech-1.1/wavs"

OOD_data: "Data/OOD_texts.txt"

min_length: 50 # sample until texts with this size are obtained for OOD texts

preprocess_params: sr: 24000 spect_params: n_fft: 2048 win_length: 1200 hop_length: 300

model_params: multispeaker: true #true #false

dim_in: 64 hidden_dim: 512 max_conv_dim: 512 n_layer: 3 n_mels: 80

n_token: 178 # number of phoneme tokens max_dur: 50 # maximum duration of a single phoneme style_dim: 128 # style vector size

dropout: 0.2

######### config for decoder

decoder:

type: 'istftnet' # either hifigan or istftnet

resblock_kernel_sizes: [3,7,11]

upsample_rates : [10, 6]

upsample_initial_channel: 512

resblock_dilation_sizes: [[1,3,5], [1,3,5], [1,3,5]]

upsample_kernel_sizes: [20, 12]

gen_istft_n_fft: 20

gen_istft_hop_size: 5

############################## decoder: type: 'hifigan' # either hifigan or istftnet resblock_kernel_sizes: [3,7,11] upsample_rates : [10,5,3,2] upsample_initial_channel: 512 resblock_dilation_sizes: [[1,3,5], [1,3,5], [1,3,5]] upsample_kernel_sizes: [20,10,6,4]

speech language model config

slm: model: 'microsoft/wavlm-base-plus' sr: 16000 # sampling rate of SLM hidden: 768 # hidden size of SLM nlayers: 13 # number of layers of SLM initial_channel: 64 # initial channels of SLM discriminator head

style diffusion model config

diffusion: embedding_mask_proba: 0.1

transformer config

transformer:
  num_layers: 3
  num_heads: 8
  head_features: 64
  multiplier: 2

# diffusion distribution config
dist:
  sigma_data: 0.2 # placeholder for estimate_sigma_data set to false
  estimate_sigma_data: true # estimate sigma_data from the current batch if set to true
  mean: -3.0
  std: 1.0

loss_params: lambda_mel: 5. # mel reconstruction loss lambda_gen: 1. # generator loss lambda_slm: 1. # slm feature matching loss

lambda_mono: 1. # monotonic alignment loss (1st stage, TMA)
lambda_s2s: 1. # sequence-to-sequence loss (1st stage, TMA)
TMA_epoch: 50 # TMA starting epoch (1st stage)

lambda_F0: 1. # F0 reconstruction loss (2nd stage)
lambda_norm: 1. # norm reconstruction loss (2nd stage)
lambda_dur: 1. # duration loss (2nd stage)
lambda_ce: 20. # duration predictor probability output CE loss (2nd stage)
lambda_sty: 1. # style reconstruction loss (2nd stage)
lambda_diff: 1. # score matching loss (2nd stage)

diff_epoch: 20 # style diffusion starting epoch (2nd stage)
joint_epoch: 50 # joint training starting epoch (2nd stage)

optimizer_params: lr: 0.0001 # general learning rate bert_lr: 0.00001 # learning rate for PLBERT ft_lr: 0.00001 # learning rate for acoustic modules

slmadv_params: min_len: 100

min_len: 400 # minimum length of samples

max_len: 500 # maximum length of samples

max_len: 200 batch_percentage: 0.5 # to prevent out of memory, only use half of the original batch size

batch_percentage: 0.5 # to prevent out of memory, only use half of the original batch size

iter: 10 # update the discriminator every this iterations of generator update thresh: 5 # gradient norm above which the gradient is scaled scale: 0.01 # gradient scaling factor for predictors from SLM discriminators sig: 1.5 # sigma for differentiable duration modeling

Karesto commented 5 months ago

i assume this happens right at the beginning. It says here : torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 7; 79.15 GiB total capacity; 2.32 GiB already allocated; 3.19 MiB free; 2.37 GiB reserved in total by PyTorch that only 2.37 can b allocated by torch, so is there anything else running on your GPU ?

SandyPanda-MLDL commented 5 months ago

i assume this happens right at the beginning. It says here : torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 7; 79.15 GiB total capacity; 2.32 GiB already allocated; 3.19 MiB free; 2.37 GiB reserved in total by PyTorch that only 2.37 can b allocated by torch, so is there anything else running on your GPU ?

Actually I am running my code in our Lab server, there are a 8 GPUs out of which 4-5 GPUs are already in used for other's code execution. I am running my code in specific GPU id (7), which is not used by anyone else as of now.

SandyPanda-MLDL commented 5 months ago

i assume this happens right at the beginning. It says here : torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 7; 79.15 GiB total capacity; 2.32 GiB already allocated; 3.19 MiB free; 2.37 GiB reserved in total by PyTorch that only 2.37 can b allocated by torch, so is there anything else running on your GPU ?

output of nvidia-smi command for GPU id 7 which I am using :

7 NVIDIA L40S Off | 00000000:24:00.0 Off | 0 | | N/A 36C P8 23W / 350W | 3MiB / 46068MiB | 0% Default | | | | N/A |

Karesto commented 5 months ago

It seems that there is some issue somewhere but i can't really put my finger on it. GPU 7 seems to be a 48 GB Card, yet torch says it's an 80 ? What command are you using to run the code ? There are issues sometimes in the code where it's .to("cuda") instead of .to("device") maybe that would help solve it ?

SandyPanda-MLDL commented 5 months ago

i assume this happens right at the beginning. It says here : torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 7; 79.15 GiB total capacity; 2.32 GiB already allocated; 3.19 MiB free; 2.37 GiB reserved in total by PyTorch that only 2.37 can b allocated by torch, so is there anything else running on your GPU ?

As I make changes to the specific lines of code where the issue was raised previously, next time the same issue appeared in different lines of codes. As for example:

File "train_second.py", line 827, in main() File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "train_second.py", line 417, in main s = model.style_encoder(st.unsqueeze(1) if multispeaker else gt.unsqueeze(1)) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/models.py", line 167, in forward h = self.shared(x) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/models.py", line 143, in forward x = self._shortcut(x) + self._residual(x) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 7; 79.15 GiB total capacity; 2.36 GiB already allocated; 9.19 MiB free; 2.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

SandyPanda-MLDL commented 5 months ago

It seems that there is some issue somewhere but i can't really put my finger on it. GPU 7 seems to be a 48 GB Card, yet torch says it's an 80 ? What command are you using to run the code ? There are issues sometimes in the code where it's .to("cuda") instead of .to("device") maybe that would help solve it ?

I am using simply this command python train_second.py

SandyPanda-MLDL commented 5 months ago

It seems that there is some issue somewhere but i can't really put my finger on it. GPU 7 seems to be a 48 GB Card, yet torch says it's an 80 ? What command are you using to run the code ? There are issues sometimes in the code where it's .to("cuda") instead of .to("device") maybe that would help solve it ?

This is how I am setting the device id, and then using " to(device) " in the required parts of the code.

device_id=7 device = torch.device((device_id) if torch.cuda.is_available() else "cpu")

SandyPanda-MLDL commented 5 months ago

It seems that there is some issue somewhere but i can't really put my finger on it. GPU 7 seems to be a 48 GB Card, yet torch says it's an 80 ? What command are you using to run the code ? There are issues sometimes in the code where it's .to("cuda") instead of .to("device") maybe that would help solve it ?

in my code I have already replaced all "to(cuda)" with "to(device)"

Karesto commented 5 months ago

This seems to be an issue that is not linked to StyleTTS, i tried to do something similar and it seemed okay. Have you tried to change device to just cuda, and use CUDA_VISIBLE_DEVICES=7 ?

SandyPanda-MLDL commented 5 months ago

This seems to be an issue that is not linked to StyleTTS, i tried to do something similar and it seemed okay. Have you tried to change device to just cuda, and use CUDA_VISIBLE_DEVICES=7 ?

No, let me do then

SandyPanda-MLDL commented 5 months ago

This seems to be an issue that is not linked to StyleTTS, i tried to do something similar and it seemed okay. Have you tried to change device to just cuda, and use CUDA_VISIBLE_DEVICES=7 ?

Thank you. Actually it seems the problem was in my end only with the GPU I am specifying. I used CUDA_VISIBLE_DEVICES command and set different GPU ids whenever I have found an idle GPU in our server. But, CUDA_VISIBLE_DEVICES=GPU id was executing my code to other GPUs instead of running my code into the specific GPU id I was specifying. That's why I set the GPU id using device_id=7 device = torch.device((device_id) if torch.cuda.is_available() else "cpu") command. However, still I was getting CUDA out of memory error.

But this time when I executed my code " CUDA_VISIBLE_DEVICES=5 python train_second.py", my code started running. I understood, I must have to do this kind of hit and trial.

Thanks for your suggestion.

martinambrus commented 2 months ago

@SandyPanda-MLDL would you mind closing this issue if it's resolved please?

SandyPanda-MLDL commented 2 months ago

Sure

On Sat, 31 Aug 2024, 20:30 Martin Ambrus, @.***> wrote:

@SandyPanda-MLDL https://github.com/SandyPanda-MLDL would you mind closing this issue if it's resolved please?

— Reply to this email directly, view it on GitHub https://github.com/yl4579/StyleTTS2/issues/256#issuecomment-2322924718, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOSXU5CKU7SOF7PEHLEFODTZUHK7HAVCNFSM6AAAAABJVJJ74KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRSHEZDINZRHA . You are receiving this because you were mentioned.Message ID: @.***>