mozilla / TTS

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Mozilla Public License 2.0
9.3k stars 1.24k forks source link

TTS example on colab not working #113

Closed faaip closed 5 years ago

faaip commented 5 years ago

Hi all.

I have been trying to train a voice using the TTS_example.ipynb as hosted on colab. When attempting to run the final cell:

!python train.py --config_path config.json --data_path ../LJSpeech-1.1/ | tee training.log

I get the following error message:

` Traceback (most recent call last): File "train.py", line 501, in main(args) File "train.py", line 359, in main num_chars = len(phonemes) if c.use_phonemes else len(symbols) AttributeError: 'AttrDict' object has no attribute 'use_phonemes'

Using CUDA: True Number of GPUs: 1 Git Hash: dce1715 Experiment folder: /content/TTS/../keep/TTS-master-February-26-2019_01+57PM-dce1715 Setting up Audio Processor... | > fft size: 2048, hop length: 275, win length: 1102 | > Audio Processor attributes. | > bits:None | > sample_rate:22050 | > num_mels:80 | > min_level_db:-100 | > frame_shift_ms:12.5 | > frame_length_ms:50 | > ref_level_db:20 | > num_freq:1025 | > power:1.5 | > preemphasis:0.97 | > griffin_lim_iters:60 | > signal_norm:True | > symmetric_norm:False | > mel_fmin:0 | > mel_fmax:None | > max_norm:1.0 | > clip_norm:True | > do_trim_silence:True | > n_fft:2048 | > hop_length:275 | > win_length:1102 ! Run is removed from /content/TTS/../keep/TTS-master-February-26-2019_01+57PM-dce1715 `

Does anyone know how this can be fixed? Thanks!

erogol commented 5 years ago

needs to be updated with the latest config.json

faaip commented 5 years ago

Thanks, @erogol . I tried using the newest config.json instead (https://github.com/mozilla/TTS/blob/master/config.json) and now get the following error:

`> Using CUDA: True

Number of GPUs: 1 Git Hash: dce1715 Experiment folder: /media/erogol/data_ssd/Data/models/ljspeech_models/queue-February-27-2019_01+02PM-dce1715 Setting up Audio Processor... | > fft size: 2048, hop length: 275, win length: 1102 | > Audio Processor attributes. | > bits:None | > sample_rate:22050 | > num_mels:80 | > min_level_db:-100 | > frame_shift_ms:12.5 | > frame_length_ms:50 | > ref_level_db:20 | > num_freq:1025 | > power:1.5 | > preemphasis:0.98 | > griffin_lim_iters:60 | > signal_norm:True | > symmetric_norm:False | > mel_fmin:0 | > mel_fmax:None | > max_norm:1.0 | > clip_norm:True | > do_trim_silence:True | > n_fft:2048 | > hop_length:275 | > win_length:1102 | > Number of characters : 61 | > Num output units : 1025

Starting a new training | > Model has 7000370 parameters DataLoader initialization | > Data path: ../LJSpeech-1.1/ | > Use phonemes: True | > phoneme language: en-us | > Cached dataset: False | > Number of instances : 12000 | > Max length sequence: 187 | > Min length sequence: 5 | > Avg length sequence: 98.4255 | > 0 instances are ignored (0) | > Batch group shuffling is active. | > Epoch 0/1000 Traceback (most recent call last): File "train.py", line 501, in main(args) File "train.py", line 439, in main scheduler, ap, epoch) File "train.py", line 79, in train for num_iter, data in enumerate(data_loader): File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 637, in next return self._process_next_batch(batch) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 138, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/content/TTS/TTS/datasets/TTSDataset.py", line 159, in getitem return self.load_data(idx) File "/content/TTS/TTS/datasets/TTSDataset.py", line 118, in load_data text = self.load_phoneme_sequence(wav_file, text) File "/content/TTS/TTS/datasets/TTSDataset.py", line 94, in load_phoneme_sequence phoneme_to_sequence(text, [self.cleaners], language=self.phoneme_language), dtype=np.int32) File "/content/TTS/TTS/utils/text/init.py", line 48, in phoneme_to_sequence phonemes = text2phone(clean_text, language) File "/content/TTS/TTS/utils/text/init.py", line 30, in text2phone ph = phonemize(text, separator=seperator, strip=False, njobs=1, backend='espeak', language=language) File "/usr/local/lib/python3.6/dist-packages/phonemizer/phonemize.py", line 90, in phonemize backend = backends[backend](language, logger=logger) File "/usr/local/lib/python3.6/dist-packages/phonemizer/backend.py", line 165, in init super(self.class, self).init(language, logger=logger) File "/usr/local/lib/python3.6/dist-packages/phonemizer/backend.py", line 57, in init '{} not installed on your system'.format(self.name)) RuntimeError: <function EspeakBackend.name at 0x7f4db349df28> not installed on your system

! Run is removed from /media/erogol/data_ssd/Data/models/ljspeech_models/queue-February-27-2019_01+02PM-dce1715`

How do I got about installing the Epseak backend on colab? Thanks.

erogol commented 5 years ago

apt-get install espeak in a colab way

faaip commented 5 years ago

Perfect. Thank you for your help.

mrgloom commented 5 years ago

Looks like for MacOS brew install espeak solves the problem.