Pytorch Implementation of Deep Learning Approach for Relation Extraction Challenge(SemEval-2010 Task #8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals)
Welcome to watch, star or fork.
This repo was tested on Python 3.6 + and PyTorch 1.0.0. The requirements are:
Relation | Train Data | Test Data | Total Data |
---|---|---|---|
Cause-Effect | 1,003 (12.54%) | 328 (12.07%) | 1331 (12.42%) |
Instrument-Agency | 504 (6.30%) | 156 (5.74%) | 660 (6.16%) |
Product-Producer | 717 (8.96%) | 231 (8.50%) | 948 (8.85%) |
Content-Container | 540 (6.75%) | 192 (7.07%) | 732 (6.83%) |
Entity-Origin | 716 (8.95%) | 258 (9.50%) | 974 (9.09%) |
Entity-Destination | 845 (10.56%) | 292 (10.75%) | 1137 (10.61%) |
Component-Whole | 941 (11.76%) | 312 (11.48%) | 1253 (11.69%) |
Member-Collection | 690 (8.63%) | 233 (8.58%) | 923 (8.61%) |
Message-Topic | 634 (7.92%) | 261 (9.61%) | 895 (8.35%) |
Other | 1,410 (17.63%) | 454 (16.71%) | 1864 (17.39%) |
Total | 8,000 (100.00%) | 2,717 (100.00%) | 10,717 (100.00%) |
Vector_50d.txt
is used as pre-trained word2vec model.Build vocabularies and parameters for your dataset by running
python build_vocab.py --data_dir data/SemEval2010_task8
It will write vocabulary files words.txt
and labels.txt
containing the words and labels in the dataset. It will also save a dataset_params.json
with some extra information.
Your experiment We created a base_model
directory for you under the experiments
directory. It contains a file params.json
which sets the hyperparameters for the experiment. It looks like
{
"learning_rate": 1e-3,
"batch_size": 50,
"num_epochs": 100
}
For every new experiment, you will need to create a new directory under experiments
with a params.json
file.
Train your experiment. Simply run
python train.py --data_dir data/SemEval2010_task8 --model_dir experiments/base_mode
It will instantiate a model and train it on the training set following the hyperparameters specified in params.json
. It will also evaluate some metrics on the development set.
Evaluation on the test set Once you've run many experiments and selected your best model and hyperparameters based on the performance on the development set, you can finally evaluate the performance of your model on the test set. Run
python evaluate.py --data_dir data/SemEval2010_task8 --model_dir experiments/base_model
BiLSTM + Attention
Precision | Recall | F1 |
---|---|---|
79.13 | 82.29 | 80.68 |
BiLSTM + MaxPooling
Precision | Recall | F1 |
---|---|---|
79.99 | 78.73 | 78.93 |
CNN
Precision | Recall | F1 |
---|---|---|
80.11 | 84.54 | 82.27 |