yajiemiao / pdnn

PDNN: A Python Toolkit for Deep Learning. http://www.cs.cmu.edu/~ymiao/pdnntk.html
Apache License 2.0
224 stars 105 forks source link

Support data I/O options: context, ignore-label, map-label #14

Closed MaigoAkisame closed 9 years ago

MaigoAkisame commented 9 years ago

Hi Yajie,

I have implemented the options "context", "ignore-label", and "map-label" when reading any data format.

Suppose you have a "train.pfile" with 10 classes (0~9), and you want to do the following things: * Treat the classes 3,4,5 as one class, and class 6 as the other; * Train a classifier for the two classes defined above, and ignore all other classes; * Pad all the features with 5 frames on both sides. You can specify --train-data "train.pfile,context=5,ignore-label=0-2:7-9,map-label=3-5:0/6:1" to achieve what you want.

Here, "context=5" can be replaced by "context=5:5" (specifying the left and right contexts separately), or "lcxt=5,rcxt=5" (as you originally supported for Kaldi feature files).

The usage of punctuation marks is rather messy, but it can be summarized as: * Commas are used to separate options; * Colons are used to separate numbers in values of options;