vivjay30 / Cone-of-Silence

The Cone of Silence:
MIT License
151 stars 23 forks source link

Mean and STD of the signal peak #11

Open rv781297 opened 3 years ago

rv781297 commented 3 years ago

Hi Vivek,

Thanks for your awesome work. What's the meaning for FG_VOL_MIN、FG_VOL_MAX、BG_VOL_MIN、BG_VOL_MAX in generate_dataset.py and how did you calculate these four values?

Best regards, KenHuang

hust-cxl commented 3 years ago

I guess these constants were used as scale factors(or SNR) to rescale the volume of wav files.

In my view, fg_target = np.random.uniform(FG_VOL_MIN, FG_VOL_MAX) fg_signals = fg_signals * fg_target / abs(fg_signals).max() used as normalization_like function to control the min and max values of signals.

Wish can help you.

rv781297 commented 3 years ago

Hi @hust-cxl,

Thanks for your help. And I got more detail about this from Vivek: "fg_target = np.random.uniform(FG_VOL_MIN, FG_VOL_MAX)" this means we randomly choose a value between the min and max. This value is the target for the peak of the foreground signal (voice). We do the same thing for the background. This is an easy way to set the volume levels in the data. It would probably be better to normalize by standard deviation or decibels, rather than the peak of the signal, but that is how the current version works.

Thanks, KenHuang