Author: Karthik Narasimhan (karthikn@csail.mit.edu)
You can clone the repository and use the production2 branch (default) for the latest code.
jdk.home.1.7
in the build.properties
file with your local install.path.variable.maven_repository
in build.properties
to your local maven repository if you wish to use your Maven installs.Use 'ant all' to compile on the terminal (requires ant version > 1.6). You can also directly import the entire directory into IntelliJ or Eclipse and compile using the GUI.
Here is an example of how to run the code from the home directory of the project. The output will contain the predicted segmentations for all the words in the test file. If you do not have gold segmentations
to test against, you can just input a file with the word as its own segmentation (i.e.
PARAMS_FILE=params.properties;
OUT_FILE=output.txt;
java -ea -Djava.library.path=lib/ -classpath "./lib/*:./out/production/Morphology" Main $PARAMS_FILE >$OUT_FILE
Most parameters in the model can be changed in the file params.properties
A good tool to produce your own vectors from a raw corpus is word2vec. You can also use any pre-existing vectors as long as they satisfy the format as specified in FORMATS.txt.
Please use the issue tracker or email me if you have any questions/suggestions.