sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
MIT License
454 stars 69 forks source link

Overloaded results. Dataset requirements? #37

Closed Vadim2S closed 2 months ago

Vadim2S commented 8 months ago

I am try SGMS. I am create my dataset: 90GB, all clean and noisy audio normalized, SNR -20...-10db. I am train 17 epoch and get strange result:

1) If I am use you model - I am get result x_hat.abs().max()<1.0 1)

2) If I am use my models - I am get result 3.0<x_hat.abs().max()<4.5 i.e. highly overloaded signal

I am can do normalization but I am notice what you code do not expect x_hat.abs().max()>1.0 and this behaviour may indicate about something get wrong. Is something get wrong?

julius-richter commented 2 months ago

Hello,

Thank you for bringing up this issue. I assume that your dataset with an SNR range of -20 to -10 dB is too challenging for the model. The SNR range is too low, and as a result, it is unable to learn the score function well, which explains why you have poor metrics.

Typically, in our experiments, we use datasets with a more moderate SNR range, e.g., between 0 and 20 dB. This range allows the model to perform much better and produce more intelligible and higher-quality speech outputs.