With this code you can:
Please cite the following paper if using the code:
context2vec: Learning Generic Context Embedding with Bidirectional LSTM
Oren Melamud, Jacob Goldberger, Ido Dagan. CoNLL, 2016 [pdf].
Note: Release 1.0 includes the original code that was used in the context2vec paper and has different dependencies (Python 2.7 and Chainer 1.7).
python setup.py install
python context2vec/eval/explore_context2vec.py MODEL_DIR/MODEL_NAME.params
>> this is a [] book
python context2vec/train/corpus_by_sent_length.py CORPUS_FILE [max-sentence-length]
python context2vec//train/train_context2vec.py -i CORPUS_FILE.DIR -w WORD_EMBEDDINGS -m MODEL -c lstm --deep yes -t 3 --dropout 0.0 -u 300 -e 10 -p 0.75 -b 100 -g 0
NOTE:
Some users have noted that this configuration can cause exploding gradients
(see issue #6). One option
is to turn down the learning rate, by reducing the Adam optimizer's alpha from
0.001 to something lower, e.g. by specifying -a 0.0005
. As an extra safety
measure, you can enable gradient clipping which could be set to 5 by using the
very scientific method of using the value everyone else seems to be using -gc 5
.
context2vec/eval/mscc_text_tokenize.py INPUT_FILE OUTPUT_FILE
for every INPUT_FILE in the MSCC train set.python context2vec/eval/sentence_completion.py Holmes.machine_format.questions.txt Holmes.machine_format.answers.txt RESULTS_FILE MODEL_NAME.params
python context2vec/eval/wsd/wsd_main.py EnglishLS.train EnglishLS.train RESULTS_FILE MODEL_NAME.params 1
scorer2 RESULTS_FILE EnglishLS.train.key EnglishLS.sensemap
python context2vec/eval/wsd/wsd_main.py EnglishLS.train EnglishLS.test RESULTS_FILE MODEL_NAME.params 1
scorer2 RESULTS_FILE EnglishLS.test.key EnglishLS.sensemap
The code for the lexical substitution evaluation is included in a separate repository [here].
Apache 2.0