I added 1s silence before and after each utterance from TIMIT dataset, however, the ACAM and bDNN model couldn't learn from the training data. Instead, it simply predicts all samples to 1 (as shown in the following pictures).
The training data is sampled at 16000 and labeled according to .phn descriptions(see also the following pictures). Does anyone have ideas how to fix that?
Hi everyone,
I added 1s silence before and after each utterance from TIMIT dataset, however, the ACAM and bDNN model couldn't learn from the training data. Instead, it simply predicts all samples to 1 (as shown in the following pictures).
The training data is sampled at 16000 and labeled according to .phn descriptions(see also the following pictures). Does anyone have ideas how to fix that?
examples of training data: https://mcgill-my.sharepoint.com/:u:/g/personal/yifei_zhao_mail_mcgill_ca/EV0mKeH4U7BFpW_ZmyRBQZQBCDSP0quq4rgVsX0CtNlXfw?e=LfW8oJ
Thx!!!