Reproduce on Voicebank dataset

neillu23 / CDiffuSE

Conditional Diffusion Probabilistic Model for Speech Enhancement

Apache License 2.0

215 stars 34 forks source link

Reproduce on Voicebank dataset #2

Open trfnhle opened 2 years ago

trfnhle commented 2 years ago

First of all, thank you for your great work I tried to reproduce on the Voicebank dataset with your code but got some problems. I try inference on checkpoint 100k but the result is not compared to your sample files and still remains background noise.

Some steps I do:

Preprocessing Voicebank dataset with flag se
Training without any modification\ And here is my loss figure:

Could you get some insight into what possibly was I doing wrong?

neillu23 commented 2 years ago

Dear @l4zyf9x,

We found there are some issues with different PyTorch and torchaudio versions and we will also try to fix this issue soon. Could you try with pytorch1.8.0/torchaudio0.8.0 or pytorch1.8.1/torchaudio0.8.1?

Here is my training loss figure:

Thank you!

trfnhle commented 2 years ago

@neillu23 Thanks for your quick response I will try with torch and touch audio version you suggest Btw, I have some more questions. I noticed that in sample audio, there are raw_enhanced.wav and enhanced.wav. What difference between them? One more thing, when we use flag se_pre, it seems to use clean audio to condition on diffusion step. I just don't see the motive why do you use clean audio in the diffusion step

neillu23 commented 2 years ago

Hi @l4zyf9x , sorry I missed your last message. I've replaced torchaudio.load_wav() with the torchaudio.load() function in the new commit. You can try it with the new torch and touchaudio versions. The enhanced.wav are further combined with a noise signal with a ratio of 0.2 to recover high-frequency speech as described at the end of Sec. 4.1, while raw_enhanced.wav is the result of no combination. The "se_pre" step was designed for our previous work DiffuSE, we tried the same initialization for CDiffuSE while writing the paper. Afterwards, we found that the pre-training step was no longer needed in CDiffuSE, since the CDiffuSE initialized randomly performed as well as CDiffuSE initialized from pre-trained parameters. Please try the new code and let me know if you have any further questions!

KarsonYu commented 1 year ago

Hello,have you try the version author mentioned?And how is the performance?

Charizard-007 commented 10 months ago

Hello,have you try the version author mentioned?And how is the performance?

Hello,have you try the version author mentioned?And how is the performance?