Closed Pranjalya closed 3 years ago
Interested in this too. I trained it on a custom dataset and tried the pretrained one with the same result. @Pranjalya were you able to progress regarding this?
@yeswecan Sadly, not yet.
Can you provide repro steps for the pretrained model? I've verified that it works fine over here with the following recipe:
git clone https://github.com/lmnt-com/diffwave
cd diffwave
wget https://lmnt.com/assets/diffwave/diffwave-ljspeech-22kHz-1000578.pt
# copy or link LJSpeech's wavs/ directory into the current directory
python src/diffwave/preprocess.py wavs/
python src/diffwave/inference.py diffwave-ljspeech-22kHz-1000578.pt wavs/LJ001-0001.wav.spec.npy -o output.wav
Can you provide repro steps for the pretrained model? I've verified that it works fine over here with the following recipe:
git clone https://github.com/lmnt-com/diffwave cd diffwave wget https://lmnt.com/assets/diffwave/diffwave-ljspeech-22kHz-1000578.pt # copy or link LJSpeech's wavs/ directory into the current directory python src/diffwave/preprocess.py wavs/ python src/diffwave/inference.py diffwave-ljspeech-22kHz-1000578.pt wavs/LJ001-0001.wav.spec.npy -o output.wav
I got the same problem as @Pranjalya . I use LJ001-0001.wav in LJSpeech-1.1 to generate mel-spectrogram LJ001-0001.wav.spec.npy through
python src/diffwave/preprocess.py wavs/
, and then generate output.wav through
wget https://lmnt.com/assets/diffwave/diffwave-ljspeech-22kHz-1000578.pt
python src/diffwave/inference.py diffwave-ljspeech-22kHz-1000578.pt wavs/LJ001-0001.wav.spec.npy -o output.wav -f
PS. torch-cpu version is used. here is the output file, output.zip
Thanks for the attachment; that was very helpful.
Okay, looks like torchaudio
made some breaking changes and that's why folks are running into this problem. I'm going to pin the requirement to torchaudio==0.7.0
instead of torchaudio>=0.6.0
.
The issue is that torchaudio used to load 16-bit samples to floating point in [-32768.0, 32767.0]. Newer versions of torchaudio rescale the samples to [-1, 1] (and they got rid of torchaudio.load_wav
entirely in 0.9.0).
Thanks for the attachment; that was very helpful.
Okay, looks like
torchaudio
made some breaking changes and that's why folks are running into this problem. I'm going to pin the requirement totorchaudio==0.7.0
instead oftorchaudio>=0.6.0
.The issue is that torchaudio used to load 16-bit samples to floating point in [-32768.0, 32767.0]. Newer versions of torchaudio rescale the samples to [-1, 1] (and they got rid of
torchaudio.load_wav
entirely in 0.9.0).
oherwise, revising transform function works.
# line33 in preprocess.py
# audio = torch.clamp(audio[0] / 32767.5, -1.0, 1.0)
audio = torch.clamp(audio[0], -1.0, 1.0)
i am also getting a static noise from the output of diffwave vocoder can you please help me here
updating torchaudio version to 0.7.0, or revising line#33 in preprocess.py also works.
@YueZhou-oh actually version is 0.9.0 and the melspectrogram is produced by a different model and I want to produce the outputfrom this vocoder , after production the voice is simply noice
@YueZhou-oh actually version is 0.9.0 and the melspectrogram is produced by a different model and I want to produce the outputfrom this vocoder , after production the voice is simply noice
sames like an amplitute issue, maybe you can check the generated melspectrogram amplitute range between your model and src/diffwave/preprocess.py
While inferencing with the provided LJSpeech pretrained model and one of the reference audio, the output is a very low amplitude sound (almost silence). And while I used a trained model over a custom dataset, the result was static noise on inferencing. What could be going wrong?