sp-uhh / storm

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
MIT License
164 stars 22 forks source link

About reverse steps N? #2

Closed kobecccccc closed 1 year ago

kobecccccc commented 1 year ago

Hi, thanks for your wonderful work. I trained Storm with same hyperparameters on librimix. It works well when N=50 or 30, but the performance drops a lot when N=10. It seems different with the paper. Is there any other parameters should be changed when N is set to a small value?

jmlemercier commented 1 year ago

Hi! Could you report your objective metrics here (including those of the mixture), please? In the paper we report that the performance remains very good up until N=20, but still there's quite a drop between N=20 and N=10. Could be that your drop is higher because of different training data and conditions. If you are able to describe a bit more in depth, I'd be interested to know and could help you find a solution if this is a bug of some sort. Best, Jm

kobecccccc commented 1 year ago

We test on LibriMix-test, and get results as follows: 1) noisy: PESQ=1.22, SI-SDR=2.5; 2) N=50: PESQ=1.94, SI-SDR=11.4; 3) N=20: PESQ=1.91, SI-SDR=11.3; 4) N=10: PESQ=1.39, SI-SDR=9.5. When N changes from 20 to 10, the white noise in the utterance increases significantly. Hyperparameters and training configuration in our experiment are the same as those in your paper. Different dataset could be one possible reason, Librimix has lower SNR (-8dB to 8dB), which could increase the task difficulty.

jmlemercier commented 1 year ago

Well the dataset used in the paper has input SNRs sampled uniformly between -6 and 14, so indeed the condition is more difficult. Also, the noise types are different as Librimix uses WHAM! noises and we used CHiME noises. I believe that should be sufficient to explain the deviation between our two results, it does not seem like a bug to me here. Thanks for reporting!