r9y9 / deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
https://r9y9.github.io/deepvoice3_pytorch/
Other
1.97k stars 485 forks source link

Out of memory while training #90

Closed yrahul3910 closed 5 years ago

yrahul3910 commented 6 years ago

I'm trying to adapt the model pre-trained on the LJSpeech dataset to my dataset which has 60 WAV samples with a total of 4 minutes of speech. I'm using the LJSpeech preset to train, but with n_workers set to 1. Even with this setting, I keep running into out of memory errors, as below.

4it [01:44, 26.12s/it]
Loss: 0.8733928799629211
Traceback (most recent call last):
  File "train.py", line 983, in <module>
    train_seq2seq=train_seq2seq, train_postnet=train_postnet)
  File "train.py", line 589, in train
    in tqdm(enumerate(data_loader)):
  File "/usr/local/lib64/python3.6/site-packages/torch/utils/data/dataloader.py", line 451, in __iter__
    return _DataLoaderIter(self)
  File "/usr/local/lib64/python3.6/site-packages/torch/utils/data/dataloader.py", line 239, in __init__
    w.start()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 66, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

I'm training on my CPU, and I have about 5.1 GB RAM free before I run the training script. My system details are below (from neofetch): screenshot from 2018-06-07 23-03-53

How do I prevent this from happening? When I trained using a smaller dataset with only 2 minutes of speech, this issue never occurred, even with n_workers set to 2.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.