Problem with the training data

Hi everyone,

I added 1s silence before and after each utterance from TIMIT dataset, however, the ACAM and bDNN model couldn't learn from the training data. Instead, it simply predicts all samples to 1 (as shown in the following pictures).

results problem

The training data is sampled at 16000 and labeled according to .phn descriptions(see also the following pictures). Does anyone have ideas how to fix that? 1s_utterence

examples of training data: https://mcgill-my.sharepoint.com/:u:/g/personal/yifei_zhao_mail_mcgill_ca/EV0mKeH4U7BFpW_ZmyRBQZQBCDSP0quq4rgVsX0CtNlXfw?e=LfW8oJ

Thx!!!

jtkim-kaist / VAD

Problem with the training data #37