jtkim-kaist / VAD

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
834 stars 232 forks source link

Questions about the data normalization #25

Open cqjjjzr opened 5 years ago

cqjjjzr commented 5 years ago

Hi Kim,

Apologize for disturbing you for many times, but I have problem understanding your normalization code. I found some code in the acoustic_feat_ex.m:

%% Save global normalization factor

global_mean = train_mean / length(audio_list);
global_std = train_std / length(audio_list);
save([save_dir, '/global_normalize_factor'], 'global_mean', 'global_std');

and in every data_reader_XXX.py:

norm_param = sio.loadmat(self._norm_dir+'/global_normalize_factor.mat')
self.train_mean = norm_param['global_mean']
self.train_std = norm_param['global_std']

My questions are:

  1. Is a global normalize factor for the whole dataset saved in acoustic_feat_ex.m? Why don't calculate factor for every single train file and apply normalization on it?
  2. If so, why this factor is used also during the prediction phase (because data_reader_XXX.pys are also used during the prediction)? Is this a mistake?

Thanks in advance!
Charlie Jiang

jtkim-kaist commented 5 years ago
  1. If each sample file has much different noise characteristic and high noise energy, the mean and variance can be depends on noise signal rather than speech signal. However, the purpose of VAD is utilizing the speech signal's statistical characteristic, global mean and variance are likely to have speech signal's mean and variance rather than noise as severe noise situation is not frequent.

  2. It is not a mistake as we cannot find global mean and variance from test dataset, however, if you use the local mean, and variance from each sample file, you can use local mean and variance from the test file if you want.

cqjjjzr commented 5 years ago

Thanks for your reply!

One more question, when the program is being used in production environment, is there any difference between using local mean and variance from each input file and using global train mean and stdvariance? If so, which should I choose?