Closed bob80333 closed 5 years ago
Please share your error messages here, as you mentioned in reddit comment
How long is the shortest audio among your training dataset?
try to add this code in dataloader.py def my_getitem(self, idx): wavpath = self.wav_list[idx] melpath = wavpath.replace('.wav', '.mel') sr, audio = read_wav_np(wavpath) audio = torch.from_numpy(audio).unsqueeze(0) mel = torch.load(melpath).squeeze(0) frame_num = min(mel.size(1), audio.size(1)//self.hp.audio.hop_length) audio = audio[:, 0:frame_num * self.hp.audio.hop_length] mel = mel[:, 0:frame_num]
Assuming that this is caused by audios shorter than 16000 samples, I'm working to solve this on padshort
branch.
@bob80333 Will you try with padshort
branch again?
You'll need to generate mel-spectrograms again at first.
git fetch origin
git checkout padshort
Assuming the smallest audio file is also the shortest, the shortest audio file has 29548 samples according to sox.
Its managed to do more than 25 epochs on the padshort
branch so far, so I think that has solved it. Thanks for the help!
Thanks for letting me know! I'll merge padshort
branch into master
.
@seungwonpark it's happening again, nevertheless the data didn't have audio shorter than 16k samples. my workaround is this edit for dataloader.py module:
https://gist.github.com/deepconsc/28d517597196e361faa2f07628ccf855
P.S. don't change batch size in validation as long as in this phase most of the tensors have different shapes.
I created a small test dataset that you can replicate by downloading this podcast and following these steps.
I then used ffmpeg to convert it to a mono 22050hz wav file with
ffmpeg -I input.mp3 -ac 1 -ar 22050 output.wav
I used sox to split on silence to have many smaller pieces into a
split_files
output folder withsox -V3 output.wav split_files/output.wav silence -l 0 3.0 1.0 5% : newfile : restart
There should be 240 pieces.
The last 24 pieces were used for validation.
Here's two seperate errors (note that the dataloader shuffling was modified to False for both these runs, despite the fact that they crash at different steps)
``` [eric@eric-pc melgan]$ python trainer.py -c config/default.yaml -n test4 2019-10-24 23:06:54,795 - INFO - Starting new training run. Validation loop: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:03<00:00, 6.90it/s] g 31.2470 d 56.5574 | step 13: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:08<00:00, 1.51it/s] 2019-10-24 23:07:10,354 - INFO - Saved checkpoint to: chkpt/test4/test4_df8b090_0000.pt g 29.4583 d 55.8972 | step 26: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:06<00:00, 1.91it/s] g 29.3384 d 55.7414 | step 39: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:06<00:00, 1.90it/s] g 31.0743 d 55.8826 | step 52: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:06<00:00, 1.87it/s] g 30.2437 d 55.5219 | step 65: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:06<00:00, 1.89it/s] Validation loop: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:03<00:00, 6.98it/s] g 32.9035 d 58.3628 | step 78: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:06<00:00, 1.88it/s] g 32.2074 d 55.6909 | step 91: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:06<00:00, 1.87it/s] g 30.4200 d 55.2120 | step 93: 15%|██████████████████████████▏ | 2/13 [00:01<00:09, 1.20it/s]2019-10-24 23:07:59,489 - INFO - Exiting due to exception: Caught RuntimeError in DataLoader worker process 2. Original Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in default_collate return [default_collate(samples) for samples in transposed] File "/usr/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in