Closed Vadim2S closed 2 months ago
Hello,
Thank you for bringing up this issue. I assume that your dataset with an SNR range of -20 to -10 dB is too challenging for the model. The SNR range is too low, and as a result, it is unable to learn the score function well, which explains why you have poor metrics.
Typically, in our experiments, we use datasets with a more moderate SNR range, e.g., between 0 and 20 dB. This range allows the model to perform much better and produce more intelligible and higher-quality speech outputs.
I am try SGMS. I am create my dataset: 90GB, all clean and noisy audio normalized, SNR -20...-10db. I am train 17 epoch and get strange result:
1) If I am use you model - I am get result x_hat.abs().max()<1.0 1)
2) If I am use my models - I am get result 3.0<x_hat.abs().max()<4.5 i.e. highly overloaded signal
I am can do normalization but I am notice what you code do not expect x_hat.abs().max()>1.0 and this behaviour may indicate about something get wrong. Is something get wrong?