huggingface / parler-tts

Inference and training library for high-quality TTS models.
Apache License 2.0
4.23k stars 418 forks source link

OutOfMemoryError Encountered During Dataset Preprocessing #75

Closed LiuZH-19 closed 3 months ago

LiuZH-19 commented 3 months ago

Hello, I have encountered an OutOfMemoryError when preprocessing my own dataset during the step of "Encoding target audio with encodec" on an A100. However, the size of my dataset is only about 1/100 of yours. When I tried to reproduce your work, everything worked fine. Specifically, I loaded the dataset from a 'json' file which contains the full paths to the audio and then converted the audio column name to Audio. Do I need to save the datasets locally or push it to the hub first? Could this be causing the issue? How can I preprocess a large dataset?

gathered_tensor tensor([0], device='cuda:0')                                                                                                                
Filter (num_proc=8): 100%|██████████| 3609/3609 [00:14<00:00, 255.72 examples/s]                                                                            
Filter (num_proc=8): 100%|██████████| 96/96 [00:13<00:00,  6.98 examples/s]                                                                                 
preprocess datasets (num_proc=8): 100%|██████████| 3589/3589 [00:13<00:00, 267.74 examples/s]                                                               
preprocess datasets (num_proc=8): 100%|██████████| 95/95 [00:14<00:00,  6.57 examples/s]                                                                    
06/18/2024 22:58:48 - INFO - __main__ - *** Encode target audio with encodec ***                                                                            
  0%|          | 0/180 [00:00<?, ?it/s]torch/nn/modules/conv.py:306: UserWarning: P
lan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Trigger
ed internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)                                                                                            
  return F.conv1d(input, weight, bias, self.stride,                                                                                                         
 20%|██        | 36/180 [00:46<04:22,  1.82s/it]torch/nn/modules/conv.py:306: UserW
arning: Plan failed with an OutOfMemoryError: CUDA out of memory. Tried to allocate 15.24 GiB. GPU  (Triggered internally at ../aten/src/ATen/native/cudnn/C
onv_v8.cpp:924.)                                                                                                                                            
  return F.conv1d(input, weight, bias, self.stride,                                                                                                         
torch/nn/modules/conv.py:306: UserWarning: Plan failed with an OutOfMemoryError: CU
DA out of memory. Tried to allocate 7.62 GiB. GPU  (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:924.)                                 
  return F.conv1d(input, weight, bias, self.stride,                                                                                                         
torch/nn/modules/conv.py:306: UserWarning: Plan failed with an OutOfMemoryError: CU
DA out of memory. Tried to allocate 30.49 GiB. GPU  (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:924.)                                
  return F.conv1d(input, weight, bias, self.stride,                                                                                                         
torch/nn/modules/conv.py:306: UserWarning: Plan failed with an OutOfMemoryError: CU
DA out of memory. Tried to allocate 22.87 GiB. GPU  (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:924.)                                
  return F.conv1d(input, weight, bias, self.stride,                                                                                                         
torch/nn/modules/conv.py:306: UserWarning: Plan failed with an OutOfMemoryError: CU
DA out of memory. Tried to allocate 15.25 GiB. GPU  (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:924.)                                
  return F.conv1d(input, weight, bias, self.stride,                                                                                                         
 21%|██        | 38/180 [00:57<03:34,  1.51s/it]                                                                                                            
Traceback (most recent call last):                                                                                                                          
  File "/parler-tts/./training/run_parler_tts_training_local.py", line 1039, in <module>                                              
    main()                                                                                                                                                  
  File "/parler-tts/./training/run_parler_tts_training_local.py", line 436, in main                                                   
    generate_labels = apply_audio_decoder(batch)                                                                                                            
  File "/parler-tts/./training/run_parler_tts_training_local.py", line 415, in apply_audio_decoder 
    generate_labels = apply_audio_decoder(batch)                                                                                                            
  File "/parler-tts/./training/run_parler_tts_training_local.py", line 415, in apply_audio_decoder                                    
    labels = audio_decoder.encode(**batch, bandwidth=bandwidth)["audio_codes"]                                                                              
  File "/parler-tts/parler_tts/dac_wrapper/modeling_dac.py", line 87, in encode
    _, encoded_frame, _, _, _ = self.model.encode(frame, n_quantizers=n_quantizers)                                                                         
  File "dac/model/dac.py", line 243, in encode
    z = self.encoder(audio_data)                                              
  File "torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)                                   
  File "torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)                                      
  File "dac/model/dac.py", line 91, in forward
    return self.block(x)                                                      
  File "torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)                                   
  File "torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)                                      
  File "torch/nn/modules/container.py", line 217, in forward
    input = module(input)                                                     
  File "torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)                                   
  File "torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)                                      
  File "dac/model/dac.py", line 61, in forward
    return self.block(x)                                                      
  File "torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)                                   
  File "torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)                                      
  File "torch/nn/modules/container.py", line 217, in forward
    input = module(input)                                                     
  File "torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)                                   
  File "torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)      
        return forward_call(*args, **kwargs)                                      
  File "torch/nn/modules/container.py", line 217, in forward
    input = module(input)                                                     
  File "torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)                                   
  File "torch/nn/modules/module.py", line 1582, in _call_impl
    result = forward_call(*args, **kwargs)                                    
  File "torch/nn/modules/conv.py", line 310, in forward
    return self._conv_forward(input, self.weight, self.bias)                                                                                                
  File "torch/nn/modules/conv.py", line 306, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,                                                                                                       
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.09 GiB. GPU   
LiuZH-19 commented 3 months ago

I have confirmed that the OutOfMemoryError is due to an excessively long audio file in my dataset.