yajiemiao / pdnn

PDNN: A Python Toolkit for Deep Learning. http://www.cs.cmu.edu/~ymiao/pdnntk.html
Apache License 2.0
224 stars 105 forks source link

kaldi data format #21

Open anarucu opened 8 years ago

anarucu commented 8 years ago

Hi, I am trying to build a DBN for language Id using PDNN. As is a huge amount of data, I decided to use kaldi data format to structure my data. I use copy-feat kaldi binary to convert my ascii features to .ark, but I don’t know how to do with the labels. I already have ascci files with the phonetic frame labels, how do I convert that into .ali files? thx in advance ana

ghost commented 8 years ago

Hi, PDNN supports the text format of Kaldi labels. You can convert your labels into a text file which contains something such as: utt1 1 0 3 5 2 0 1 utt2 2 1 3 1 4 0 1 1 ... ...

The first field is always the utterance IDs which are followed by a sequence of classes (integer indices) at the frame level. In the example above, utt1 has 7 frames which have the class labels of "1 0 3 5 2 0 1"