jtkim-kaist / VAD

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
834 stars 232 forks source link

Problem with the training data #37

Closed BeginnerYifei closed 3 years ago

BeginnerYifei commented 3 years ago

Hi everyone,

I added 1s silence before and after each utterance from TIMIT dataset, however, the ACAM and bDNN model couldn't learn from the training data. Instead, it simply predicts all samples to 1 (as shown in the following pictures).

results problem

The training data is sampled at 16000 and labeled according to .phn descriptions(see also the following pictures). Does anyone have ideas how to fix that? 1s_utterence

examples of training data: https://mcgill-my.sharepoint.com/:u:/g/personal/yifei_zhao_mail_mcgill_ca/EV0mKeH4U7BFpW_ZmyRBQZQBCDSP0quq4rgVsX0CtNlXfw?e=LfW8oJ

Thx!!!