sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
MIT License
454 stars 69 forks source link

Can not get good result with really bad dataset #38

Closed Vadim2S closed 2 months ago

Vadim2S commented 8 months ago

I am try SGMSE. I am create my dataset: 90GB, all clean and noisy audio normalized, SNR -20...-10db. I am train 17 epoch and after 40K iterations all metrics stabilized around very low values (ESTOI 0,65; PESQ 1.1; SISDR -20db). Of cource sound is awfull and speech do not recognized.

What you dataset SNR parameters and how much bad sound you try? I am appreciate any recomendations for my case.

julius-richter commented 2 months ago

Hello,

Thank you for bringing up this issue. I assume that your data set with an SNR range of -20 to -10 dB is too challenging for the model. The SNR range is too low, and as a result, it is unable to learn the score function well, which explains why you have poor metrics.

Typically, in our experiments, we use datasets with a more moderate SNR range, e.g., between 0 and 20 dB. This range allows the model to perform much better and produce more intelligible and higher-quality speech outputs.