sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
MIT License
454 stars 69 forks source link

Code for WSJ0-REVERB dataset reproduction #7

Closed stefan-baumann closed 1 year ago

stefan-baumann commented 2 years ago

Hi, thank you very much for the great work with the models and making this repository available. I am trying to reproduce some of the results from your paper and was wondering whether you could also make the code for creating the WSJ0-REVERB dataset as in your paper available. The paper already gives quite a bit of information but a release of the code to reproduce the exact dataset you used to train your model would be really appreciated as comparisons are difficult otherwise. Especially the test set would be good to have accessible for fair comparisons.

I also have some other questions about the models and this repo - I'll create some separate issues for them to not clutter up this one, I hope that's okay :)

All the best, Stefan

julius-richter commented 2 years ago

Hi Stefan, thanks for your interest in our work!

We have added the code for the WSJ0-REVERB dataset creation in our latest commit. You'll find the script here: https://github.com/sp-uhh/sgmse/blob/main/preprocessing/create_wsj0_reverb.py

To run the script you need to set the arguments:

stefan-baumann commented 2 years ago

Hi Julius, thank you very much for the incredibly quick response and for sharing the code! I'll give it a try.

stefan-baumann commented 1 year ago

Hey, I've been trying to reproduce a training on the WSJ0-REVERB dataset created from the normal WSJ0 dataset and using the exact code you provided and am getting some weird behaviour: I see both some comb filtering-like artifacts in the target data and something that looks like a very short time delay between reverberant & anechoic in some samples. I can somewhat understand why there would be a time delay although it would make more sense to me if they were fully aligned, but I don't understand how you're getting the really clean samples presented on the project page with a model that's trained on these randomly comb-filtered targets.

Here's a grid of 16 anechoic samples and their reverberant counterparts from the dataset as generated using your script as a reference: wsj0_reverb_anechoic wsj0_reverb_reverb

jmlemercier commented 1 year ago

Hi Stefan, Thank you for spotting this, this is indeed a simple error coupled to a bug in the PyRoomAcoustics package.

We simplified the dataset generation code to post it here, and did a mistake in the default method used to simulate the reverberation. The code you used activates ray tracing in addition to the image source model. The problem apparently is that if you set a very high absorption coefficient for generating the "dry" room (corresponding to the so-called anechoic targets), ray tracing messes up the simulation, generating the comb filter artifacts displayed here.

We uploaded a new version of the code where only the image source model is used, deactivating the ray tracing method. The artifacts should disappear, and the results should be then consistent to the data we used for the approach presented in the paper.

Don't hesitate to reopen the issue if this does not solve the problem, for now I'll ask Julius to close this issue.

Best, Jean-Marie