jtkim-kaist / VAD

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
842 stars 235 forks source link

question about DNN learning curve #20

Open machinelearningisgood opened 5 years ago

machinelearningisgood commented 5 years ago

Hi Kim, I hope you have a good weekend! When you have a minute, would you please take a look at my questions about your code and let me know what you think of it? Here is the question:

  1. I have run you DNN code with my own timit data and changed the iteration time to get more data as shown in the attachment VAD_DNN.py。However, the valid accuracy is higher than the training accuracy, which is very strange. And the accuracy and cost of training data are fluctuating dramatically. For example, the training accuracy is 47% while the valid is 96% at the last point. Could you explain it?

  2. I also trained my own DNN without using feature extraction and got the opposite result that the accuracy and cost of valid data are fluctuating dramatically. It seems that there is an overfitting problem. The dropout rate is already 0.7. Could you give me some advice

I really appreciate your feedback VDA_DNN py your DNN model my own learning curve

jtkim-kaist commented 5 years ago

Thank you for your detailed question. My advice is,

In VAD field, I experimentally found that dropout seems not to be necessary as we use already noise corrupted data, which cause the regularization effect. As in your figure 1, the DNN could not stably contain (train) your noisy dataset. Maybe the properties of noise types in your training dataset are too diverge each other. My advice is, try to reduce the batch size (e.g. 256) and do not shuffle your batches . It can make the training more stably. After trying this advice plz let me know. THX.

machinelearningisgood commented 5 years ago

Thank for your quick reply! I add the silence segment to the front and end of each wave file to make the voice and noise almost equal as said in your paper. The current batch of my DNN model (pytorch one) size is 2000 and no shuffle. I choose this large batch size as I can train one wave file each time and I also want to plot speech hit accuracy and non-speech hit accuracy besides the overall accuracy. The speech hit accuracy and non-speech hit accuracy may vary significantly with small batch size because one bath may only contain voice or noise. And I only corrupted my dataset with 10 dB factory noise. Anyway, I will try your advice to reduce the batch size and will post later

Now, I trained your DNN model (tensorflow one) with your dataset and default batch size(128) and it shows a similar phenomenon as using my dataset.
trained with your datat

jtkim-kaist commented 5 years ago

Thank you for your sharing.

My question is

  1. What is your validation set? (Also, the higher accuracy of validation set is not the problem as it can be changed if you use the harsh noise to the validation set)

  2. Although training curve is not good, it seems to be learning according to the validation cost. Anyway, in order to verify the problem, Please remove the noise from your dataset. According to my case, if you use clean speech to the training, TRAINING AND VALIDATION CURVE MUST BE STABLE if there is no problem in your used acoustic features, training method and model.

  3. What is your used features, and normalization technique.

  4. Have you ever decreased the learning rate? if you used the adam, 1e-4~ 1e-5 may be optimal.

machinelearningisgood commented 5 years ago

Thank you for patience

  1. For my model, 75% of original TIMIT training data was used as training data and the remaining 25% was used as validation data (both added factory noise with 10 dB SNR). For your model, now I used your dataset.

  2. For your model, now I use your dataset, but the training curve still unstable (if 'smoothing' is set to zero on tensorboard, but it will be stable when it is increased to 0.954 ). Maybe there are some problems with training model. I will redownload your code and check it again.

  3. I do not use any features, I train it with raw-waveform like this paper (with a different model)google paper. I used BatchNorm1d in pytorch like you did it in tensorflow.

  4. I have not decreased the learning rate yet. Adam is used and my current learning_rate is 0.001. I will decrease it as you suggested

machinelearningisgood commented 5 years ago

Hi, Kim, I have tried a few days,

  1. Reducing the batch size (512) does not help, and it even makes the training loss not stable

  2. Reducing the learning rate (1e-4,1e-5) does not help. The val loss is still not stable

  3. Using your clean signal does work. The train and val loss curve are more stable. Also, I have changed the val code to average loss and accuracy for all the wave files in the valid folder for one validation as it just has six wave files (the old code average five wave files for one validation and next five wave files for next validation. There are about 1000 wave files and if averaging all wave files, it may take much longer time). Maybe I should just use 5% training dataset as validation data like said in your paper

jtkim-kaist commented 5 years ago

Thank you for sharing.

Note that there is some additional tricks apart from the training method described in my paper (because of paper limitation).

Training noisy data is hard problem because of underfitting (your problem) that I also have experienced.

So, One of my way is trying to learn our model with easy noise (high SNR) to hard noise (low SNR).

machinelearningisgood commented 5 years ago

Thank you very much! I will try

ucasiggcas commented 5 years ago

Dear @machinelearningisgood Did you reproduce the paper and could you please share your source codes? Thx