microsoft / MS-SNSD

The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
MIT License
468 stars 142 forks source link

'noisescalar' derivation in clean speech and noise mix #18

Open haozh7109 opened 3 years ago

haozh7109 commented 3 years ago

Hi,

Thanks for sharing this open-source dataset. I am trying to apply this code to generate synthetic noisy datasets for speech processing. In my practice, I observed that the code-generated data has only half of SNR than the code nominated, which I tested from Audacity. After further checked the 'audiolib.py', I think the 'noisescalar' derivation (line 68) seems to be incorrect.

In the 'audiolib.py' code, the original code is: noisescalar = np.sqrt(rmsclean / (10(snr/20)) / rmsnoise)**

Where I think the square root shall not be used for the noise scalar since the SNR is calculated based on RMS in the derivation, and it shall be corrected as below in the scaling of the noise level. noisescalar = rmsclean / (10(snr/20)) / rmsnoise**

In my test, I got the synthetic noisy data with the correct SNR level after this correction. So could you please correct it in the code?

ithoidis commented 3 years ago

I agree with @haozh7109, the SNR computation should be corrected.