karuiwu / reparse

1 stars 0 forks source link

Compiling MaltParser: ant dist java -jar ./dist/maltparser-1.7.2/maltparser-1.7.2.jar -c test -i examples/data/english_train.conll -m learn java -jar ./dist/maltparser-1.7.2/maltparser-1.7.2.jar -c test -i examples/data/english_test.conll -o out.conll -m parse -v off

java -jar ./dist/maltparser-1.7.2/maltparser-1.7.2.jar -c test -i ../data/parser/train/0 -m learn java -jar ./dist/maltparser-1.7.2/maltparser-1.7.2.jar -c test -i ../data/parser/dev/0 -o out.conll -m parse -v off > output

Running CoNLL data splitting tool: num_sets indicates not only how many train and dev data sets produced (through jack-knifing), but also the ratio of training to dev data. For instance, if we have 10,000 sentences in train.dep.parse, setting num_sets to 2 will yield 5,000 training sentences and 5,000 dev sentences (x2). If we set num_sets to 10, we will have 9,000 training sentences and 1,000 dev sentences (x10). Therefore, I like to run with num_sets >= 10. python CoNLLSplitter.py train.dep.parse [num_sets]

MegaM: Training: ./megam.opt [multiclass, binary] data/classifier/train/0 > weights Predicting: ./megam.opt -predict weights [multiclass, binary] data/classifier/dev/0 | head -5

Libraries used to compile our modified zPar: You will need to install

Installing Boost and Vowpal Wabbit: On linux systems such as debian, both of these packages can be installed by calling:

sudo apt-get install libboost-all-dev vowpal-wabbit libvw0 libvw-dev

Additional Note: For some reason, some of vowpal wabbit's files are not included in the apt-get package install. We were able to get it all working by downloading the vowpal wabbit source from the github repository and moving the correct file location (in our case it was /usr/include/vowpalwabbit)

In order to compile zPar in the zpar/ directory, just run: make generic.depparser

And to run the parser, you will need to: train: dist/generic.depparser/train [input Conll training data file] model 1 test: dist/generic.depparser/depparser -c [input Conll test data file] [output file] model