sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
MIT License
454 stars 69 forks source link

Reproducing results with VB-DMD testset #19

Closed k-sparrow closed 7 months ago

k-sparrow commented 1 year ago

Hi,

Thank you very much for your work and also for the published resources.

As I don't have access to WSJ dataset, I'm trying to reproduce the paper's result on VB-DMD (taken from here: https://datashare.ed.ac.uk/handle/10283/2791).

I have a couple of questions:

  1. Which checkpoint should be used for with enhancement.py for the VB-DMD noisy testset?
  2. I've played a little with the code, and it seems that the .wav files from the VM-DMD noisy test are samples ad 48KHz, while the models were trained on 16KHz (https://github.com/sp-uhh/sgmse/issues/16#issuecomment-1333990502). Should the .wav files be downsampled to 16KHz before passing it to the model?
  3. I also want to evaluate the dereverberation checkpoint. Can the preprocessing scripts be used also on the VB-DMD testset?
cobalamin commented 1 year ago

Hi, thanks for your interest in our work!

  1. The train_vb_29nqe0uh... checkpoint file was trained on VB-DMD specifically, whereas the train_wsj0_2cta4cov... file was trained on WSJ0-CHiME3. You should get best results for VB-DMD with the former checkpoint.
  2. You're right, the files should be resampled to 16kHz first.
  3. I don't see why not, you would just need to change the subfolders in the preprocessing script to match the folder structure of your VB-DMD dataset (currently it's referring to si_dt_05 for validation set etc.)

I hope that helps!