Closed Maggione closed 2 years ago
Hi, Maggione Considering SpeechT5 inputs 16kHz sample rate waveforms, we resampled the LIBRITTS waveforms from 24kHz to 16kHz. The down-sampling details are as follows.
import soundfile as sf
import librosa
# file = ...
# new_file = ...
audio, fs = sf.read(file)
x = librosa.resample(audio, fs, 16000)
sf.write(str(new_file), x, 16000)
Hi, In the paper of SpeechT5, it used librispeech-960h as the speech pre-training dataset whose sample rate is 16kHz, while it used libri-tts as the tts dataset whose sample rate is 24kHz. How do you deal with this mismatch? Thank you! :)