Open trfnhle opened 2 years ago
Dear @l4zyf9x,
We found there are some issues with different PyTorch and torchaudio versions and we will also try to fix this issue soon. Could you try with pytorch1.8.0/torchaudio0.8.0 or pytorch1.8.1/torchaudio0.8.1?
Here is my training loss figure:
Thank you!
@neillu23 Thanks for your quick response I will try with torch and touch audio version you suggest Btw, I have some more questions. I noticed that in sample audio, there are raw_enhanced.wav and enhanced.wav. What difference between them? One more thing, when we use flag se_pre, it seems to use clean audio to condition on diffusion step. I just don't see the motive why do you use clean audio in the diffusion step
Hi @l4zyf9x , sorry I missed your last message. I've replaced torchaudio.load_wav() with the torchaudio.load() function in the new commit. You can try it with the new torch and touchaudio versions. The enhanced.wav are further combined with a noise signal with a ratio of 0.2 to recover high-frequency speech as described at the end of Sec. 4.1, while raw_enhanced.wav is the result of no combination. The "se_pre" step was designed for our previous work DiffuSE, we tried the same initialization for CDiffuSE while writing the paper. Afterwards, we found that the pre-training step was no longer needed in CDiffuSE, since the CDiffuSE initialized randomly performed as well as CDiffuSE initialized from pre-trained parameters. Please try the new code and let me know if you have any further questions!
Hello,have you try the version author mentioned?And how is the performance?
Hello,have you try the version author mentioned?And how is the performance?
Hello,have you try the version author mentioned?And how is the performance?
First of all, thank you for your great work I tried to reproduce on the Voicebank dataset with your code but got some problems. I try inference on checkpoint 100k but the result is not compared to your sample files and still remains background noise.
Some steps I do:
Could you get some insight into what possibly was I doing wrong?