yh1008 / speech-to-text

mixlingual speech recognition system; hybrid (GMM+NNet) model; Kaldi + Keras
http://llcao.net/cu-deeplearning17/project.html
70 stars 19 forks source link

need to trim audio files?! #5

Closed yh1008 closed 7 years ago

yh1008 commented 7 years ago

in our current code-switching dataset, a single audio file has partial transcriptions specified from milliseconds to milliseconds; for example:

given a piece of audio 01NC01FBX_0101.flac, we have its transcriptions as the following: 01NC01FBX_0101 86300 88370 then area five 的 total 是 01NC01FBX_0101 165090 167860 不懂 but official result 还没有 出 i think 出了 他们 就会

here 01NC01FBX_0101.flac is a single audio file, and it has multiple transcriptions for different frames (86300-88370) and (165090-167860).

However, seems like Kaldi only accepts one utterance (one audio file) to contain one transcription. If that is indeed the case, we need to trim the existing audio into separate ones for Kaldi to process. In the above example, we need to create 2 files 01NC01FBX_0101_86300_88370 and 01NC01FBX_0101_165090_167860 out of the original 01NC01FBX_0101.flac

yh1008 commented 7 years ago

I did some further googling, and seems like Kaldi also accepts a segments file, which contains http://kaldi-asr.org/doc/data_prep.html.

I am exactly sure how it works, but looks like I may not need to manually trim my audios into separate ones!

yh1008 commented 7 years ago

solved using segments file