r9y9 / deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
https://r9y9.github.io/deepvoice3_pytorch/
Other
1.97k stars 485 forks source link

AssertionError #32

Closed homink closed 6 years ago

homink commented 6 years ago

Hi,

I am new to pytorch and following the example of jsut here. And I encountered the following assertion error which is hard for me to look in further. Could anyone help me out?

[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ python -V
Python 3.5.4 :: Anaconda custom (64-bit)
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ ls /home/kwon/copora/jsut_ver1.1
basic5000  ChangeLog.txt  countersuffix26  LICENCE.txt  loanword128  onomatopee300  precedent130  README_en.txt  README_ja.txt  repeat500  travel1000  utparaphrase512  voiceactress100
[kwon@ssi-dnn-slave-002 deepvoice3_pytorch]$ python preprocess.py jsut /home/kwon/copora/jsut_ver1.1 ./data/jsut
  0%|                                                                                                                                                                               | 0/7696 [00:00<?, ?it/s]concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/kwon/anaconda3/lib/python3.5/concurrent/futures/process.py", line 175, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/kwon/3rdParty/deepvoice3_pytorch/jsut.py", line 52, in _process_utterance
    mel_spectrogram = audio.melspectrogram(wav).astype(np.float32)
  File "/home/kwon/3rdParty/deepvoice3_pytorch/audio.py", line 50, in melspectrogram
    assert S.max() <= 0 and S.min() - hparams.min_level_db >= 0
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "preprocess.py", line 47, in <module>
    preprocess(mod, in_dir, out_dir, num_workers)
  File "preprocess.py", line 21, in preprocess
    metadata = mod.build_from_path(in_dir, out_dir, num_workers, tqdm=tqdm)
  File "/home/kwon/3rdParty/deepvoice3_pytorch/jsut.py", line 25, in build_from_path
    return [future.result() for future in tqdm(futures)]
  File "/home/kwon/3rdParty/deepvoice3_pytorch/jsut.py", line 25, in <listcomp>
    return [future.result() for future in tqdm(futures)]
  File "/home/kwon/anaconda3/lib/python3.5/concurrent/futures/_base.py", line 405, in result
    return self.__get_result()
  File "/home/kwon/anaconda3/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result
    raise self._exception
AssertionError
r9y9 commented 6 years ago

That's a sign there's a clipping during feature normalization. For quick fix, try allow_clipping_in_normalization=True. This was the default for months and I believe it doesn't affect speech quality much.

https://github.com/r9y9/deepvoice3_pytorch/blob/aeed2258a9b1111669a793ae161cfe0a7396fdf0/hparams.py#L119-L122

If you want to avoid clipping completely (rarely happen, though), you need to adjust normalization parameters: https://github.com/r9y9/deepvoice3_pytorch/blob/aeed2258a9b1111669a793ae161cfe0a7396fdf0/hparams.py#L117-L118.

homink commented 6 years ago

Thanks a lot!