Closed schinavro closed 2 years ago
I had the same issue trying to use the pretrained model. In my case it generated silence only.
Here are the exact steps I use to generate audio from the pretrained model:
git clone https://github.com/lmnt-com/diffwave
cd diffwave
pip install .
wget https://lmnt.com/assets/diffwave/diffwave-ljspeech-22kHz-1000578.pt
wget https://lmnt.com/assets/diffwave/22kHz/ljspeech/reference_0.wav
python -m diffwave.preprocess .
python -m diffwave.inference diffwave-ljspeech-22kHz-1000578.pt -s reference_0.wav.spec.npy -f
The result is placed in output.wav
and should sound reasonable. Can you give these steps a shot and see if it works for you?
The most common cause for silence / unstable audio is incorrect input scaling. I've updated the preprocessing script to behave correctly with newer versions of torchaudio; maybe that was the issue you were running into.
Hi, I have trouble using pre-trained model and badly wants your help.
I wanted to check the performance of
Diffwave
with pretrained prameters.Since there was no demo for it, I've write my own script that importing pretrained model.
Purpose of the script is to compare the original audio and generated audio from pretrained vocoder.
First, I've generated Mel spectrogram from one of audio samples provieded in https://github.com/lmnt-com/diffwave#audio-samples.
Using the created spectrogram,
spec
, I've generate audio file which should give similar audio from above.However, the results was far from the original. It was unstable, and doesn't give similar results on the Demo.
Is there any problem on my code? or Is there way of properly using pre-trained parameters?
I would really appreciate if there is any example code that I can use pre-trained model properly.
Thanks.