microsoft / DNS-Challenge

This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Creative Commons Attribution 4.0 International
1.12k stars 414 forks source link

There is something wrong at line 169 in audiolib.py #165

Open ZihaoCui opened 1 year ago

ZihaoCui commented 1 year ago

The segmental_snr_mixer function in audiolib.py is wrong. Because the normalization is applied, the noisescalar calculation is noisescalar = 1 / (10(snr/20)) not be noisescalar = rmsclean / (10(snr/20)) / (rmsnoise+EPS)

Based on the test file import numpy as np import audiolib params = {'cfg':1, 'target_level_lower':-35,'target_level_upper':-25} temp = [0.1,0.2,0.3,0.4,0.5] for abs_a in range(len(temp)): for abs_b in range(len(temp)): base_a = np.random.binomial(1,temp[abs_a],size=(5000,1)) base_b = np.random.binomial(1,temp[abs_b],size=(5000,1)) print(temp[abs_a], temp[abs_b], audiolib.segmental_snr_mixer(params,base_a,base_b,snr=5))

and add the following code after line 169 in audiolib.py rmsclean1, rmsnoise2 = active_rms(clean=clean, noise=noisenewlevel) snr_test = 20*np.log10(rmsclean1/rmsnoise2) return snr_test, rmsclean, rmsnoise

we have 0.1 0.1 (4.809261190526833, 0.3199999999999999, 0.31304951684997046) 0.1 0.2 (8.398437767900433, 0.3085449724108302, 0.45628938186199325) 0.1 0.3 (9.541254719329668, 0.32557641192199405, 0.5491812087098392) 0.1 0.4 (11.254207673313344, 0.30822070014844877, 0.6332456079595024) 0.1 0.5 (11.975890986196333, 0.3174901573277508, 0.7088018058667739) 0.2 0.1 (1.8278324324274429, 0.4436214602563766, 0.307896086366813) 0.2 0.2 (5.030355204691368, 0.44676615807377346, 0.44833023542919775) 0.2 0.3 (6.760912590556815, 0.4498888751680796, 0.5509990925582363) 0.2 0.4 (8.027724031442382, 0.4460941604639091, 0.6321392251711642) 0.2 0.5 (8.922584416928597, 0.45409250158970904, 0.7133021800050802) 0.3 0.1 (0.009673346077310881, 0.5594640292279744, 0.3149603149604724) 0.3 0.2 (3.2262700519281093, 0.5526300751859239, 0.4505552130427523) 0.3 0.3 (5.140245659899135, 0.5464430436925699, 0.5553377350765928) 0.3 0.4 (6.272472841406196, 0.5479051012721089, 0.6343500610861481) 0.3 0.5 (7.247686760565433, 0.5499090833947007, 0.7123201527403249) 0.4 0.1 (-1.350831784195107, 0.6315061361538776, 0.3039736830714132) 0.4 0.2 (2.0700880682248917, 0.6378087487640788, 0.4551922670696416) 0.4 0.3 (3.875478643051899, 0.6288083968904994, 0.5524490926773252) 0.4 0.4 (4.854129294663155, 0.6463745044476923, 0.6356099432828279) 0.4 0.5 (5.987176033927486, 0.634665266104897, 0.711055553385247) 0.5 0.1 (-1.9690929118423788, 0.7103520254071215, 0.31843366656181304) 0.5 0.2 (1.0023430870343295, 0.707530918052349, 0.44654227123532203) 0.5 0.3 (2.6995947405835983, 0.7142828571371427, 0.5480875842417887) 0.5 0.4 (4.139592830567267, 0.7037044834303671, 0.6373382147651275) 0.5 0.5 (4.885728516741401, 0.707530918052349, 0.6982836100038435)

ZihaoCui commented 1 year ago

or just annotatig the line 166, 167

bngcode commented 1 year ago

I came to the same conclusion. The way how th SNR is computed is resulting in snr + 20log(|rmsnoise|/|rmsclean|), when using

noisescalar = rmsclean / (10**(snr/20)) / (rmsnoise+EPS).