Closed Vadim2S closed 4 months ago
Hello,
Thank you for bringing up this issue. I assume that your data set with an SNR range of -20 to -10 dB is too challenging for the model. The SNR range is too low, and as a result, it is unable to learn the score function well, which explains why you have poor metrics.
Typically, in our experiments, we use datasets with a more moderate SNR range, e.g., between 0 and 20 dB. This range allows the model to perform much better and produce more intelligible and higher-quality speech outputs.
I am try SGMSE. I am create my dataset: 90GB, all clean and noisy audio normalized, SNR -20...-10db. I am train 17 epoch and after 40K iterations all metrics stabilized around very low values (ESTOI 0,65; PESQ 1.1; SISDR -20db). Of cource sound is awfull and speech do not recognized.
What you dataset SNR parameters and how much bad sound you try? I am appreciate any recomendations for my case.