mozilla / TTS

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Mozilla Public License 2.0
9.41k stars 1.26k forks source link

Preprocessing on the fly #214

Closed ghost closed 5 years ago

ghost commented 5 years ago

Hi, training is quite slow and I'm guessing that the spectrograms are generated during training. Can I preprocess all the audios beforehand to speed up training?

erogol commented 5 years ago

Can you find a way to verify your guess? If your wav files are not resampled to < 20K or lower bitrate, it might also slows things down.

ghost commented 5 years ago

Hi, thanks for answering so fast. My audio is sampled at 16kHz. I was taking a look at the dataloader in TTS/datasets/TTSDataset.py and inside the collate_fn there is this section:

mel = [self.ap.melspectrogram(w).astype('float32') for w in wav] linear = [self.ap.spectrogram(w).astype('float32') for w in wav]

I think it's calculating the spectrograms when generating each batch, instead of calculating all of them previously.

erogol commented 5 years ago

To my experiments, spec. computation is not a bottleneck. I guess it is better for you to profile the code before taking some steps.

Besides the sampling rate, you can also check the bitrate. soxi path/to/audio.wav command should give you the answer on the terminal.

ghost commented 5 years ago

Ok, thanks a lot, it does seem that the specs are not the bottleneck. Do you know about how much time would it take to train on a V100?

erogol commented 5 years ago

no idea, but things start to work good around 100K iters relative to the dataset.

ghost commented 5 years ago

Thanks, I'll close the issue now.