Closed HudsonHuang closed 3 years ago
Hi, thanks for your suggestion. I'm actually considering ditching librosa
for torchaudio
especially after I chose to do silence trimming with sox
instead of webrtcvad
.
Since I'd like to make the preprocessing modules as simple as possible (import less packages as possible), I probably need some time to study the usage of sox effects in the most recent version of torchaudio
.
I've developed completely new preprocessing toolkits which use torchaudio
, can be compiled with TorchScript and be used anywhere without any dependencies.
I appreciate your efforts, nice work. But your audio_toolkit was implement in librosa and numpy, which was not differentiable. It might limited the application. Eg. If I have an TTS model to generated Mel spectrogram, and if your dvector if fully differentiable, we can use this like a discriminator, to force the TTS model output exactly as expected person. From waveform to Melspectrogram, you can make preprocessing fully differentiable with torchaudio, and it seems it can keep consitency with librosa