Closed ghost closed 5 years ago
Can you find a way to verify your guess? If your wav files are not resampled to < 20K or lower bitrate, it might also slows things down.
Hi, thanks for answering so fast. My audio is sampled at 16kHz. I was taking a look at the dataloader in TTS/datasets/TTSDataset.py and inside the collate_fn there is this section:
mel = [self.ap.melspectrogram(w).astype('float32') for w in wav]
linear = [self.ap.spectrogram(w).astype('float32') for w in wav]
I think it's calculating the spectrograms when generating each batch, instead of calculating all of them previously.
To my experiments, spec. computation is not a bottleneck. I guess it is better for you to profile the code before taking some steps.
Besides the sampling rate, you can also check the bitrate. soxi path/to/audio.wav
command should give you the answer on the terminal.
Ok, thanks a lot, it does seem that the specs are not the bottleneck. Do you know about how much time would it take to train on a V100?
no idea, but things start to work good around 100K iters relative to the dataset.
Thanks, I'll close the issue now.
Hi, training is quite slow and I'm guessing that the spectrograms are generated during training. Can I preprocess all the audios beforehand to speed up training?