Closed noCodegirl closed 6 years ago
we cut the utterance for 3 seconds, for the utterance whose period shorter than 3 seconds, we apply zero-padding and for the utterance whose period longer than 3 seconds, we cut the utterance for 2 parts, one begin from the begin, while the other one begin from the last 3 seconds(because most utterance shorter than 5 seconds), and since "happy" is the most difficult emotion for predicting and the number of "happy" training utterances is small, we re-sample the "happy" training utterances.
请问你的输入数据格式是怎么处理的,有关输入数据不等长的问题