Support data I/O options: context, ignore-label, map-label

Hi Yajie,

I have implemented the options "context", "ignore-label", and "map-label" when reading any data format.

Suppose you have a "train.pfile" with 10 classes (0~9), and you want to do the following things: * Treat the classes 3,4,5 as one class, and class 6 as the other; * Train a classifier for the two classes defined above, and ignore all other classes; * Pad all the features with 5 frames on both sides. You can specify --train-data "train.pfile,context=5,ignore-label=0-2:7-9,map-label=3-5:0/6:1" to achieve what you want.

Here, "context=5" can be replaced by "context=5:5" (specifying the left and right contexts separately), or "lcxt=5,rcxt=5" (as you originally supported for Kaldi feature files).

The usage of punctuation marks is rather messy, but it can be summarized as: * Commas are used to separate options; * Colons are used to separate numbers in values of options;

But in "map-label", slashes are used to separate mappings, and colons are used to separate the original and mapped labels; * Dashes are used to denote a range of labels.

I tried to make my implementation compatible with everything pre-existing (e.g. both stream and non-stream mode of pfile reading). I have tested my implementation with pickle files and pfiles, but not with Kaldi files; if you have some Kaldi files, you may test it out.

yajiemiao / pdnn

Support data I/O options: context, ignore-label, map-label #14