Check Our New NER Toolkit🚀🚀🚀
No line-by-line annotations, AutoNER trains named entity taggers with distant supervision.
Details about AutoNER can be accessed at: https://arxiv.org/abs/1809.03599
Method | Precision | Recall | F1 |
---|---|---|---|
Supervised Benchmark | 88.84 | 85.16 | 86.96 |
Dictionary Match | 93.93 | 58.35 | 71.98 |
Fuzzy-LSTM-CRF | 88.27 | 76.75 | 82.11 |
AutoNER | 88.96 | 81.00 | 84.80 |
data/BC5CDR/raw_text.txt
data/BC5CDR/dict_core.txt
data/BC5CDR/dict_full.txt
embedding/bio_embedding.txt
, which can be downloaded from our group's server. For example, curl http://dmserv4.cs.illinois.edu/bio_embedding.txt -o embedding/bio_embedding.txt
. Since the embedding encoding step consumes quite a lot of memory, we also provide the encoded file in the autoner_train.sh
.data/BC5CDR/truth_dev.ck
and data/BC5CDR/truth_test.ck
Tie or Break
label, entity type).I
is Break
.O
is Tie
.<s>
and <eof>
mean the start and end of the sentence.This project is based on python>=3.6
. The dependent package for this project is listed as below:
numpy==1.13.1
tqdm
torch-scope>=0.5.0
pytorch==0.4.1
To train an AutoNER model, please run
./autoner_train.sh
To apply the trained AutoNER model, please run
./autoner_test.sh
You can specify the parameters in the bash files. The variables names are self-explained.
Please cite the following two papers if you are using our tool. Thanks!
@inproceedings{shang2018learning,
title = {Learning Named Entity Tagger using Domain-Specific Dictionary},
author = {Shang, Jingbo and Liu, Liyuan and Ren, Xiang and Gu, Xiaotao and Ren, Teng and Han, Jiawei},
booktitle = {EMNLP},
year = 2018,
}
@article{shang2018automated,
title = {Automated phrase mining from massive text corpora},
author = {Shang, Jingbo and Liu, Jialu and Jiang, Meng and Ren, Xiang and Voss, Clare R and Han, Jiawei},
journal = {IEEE Transactions on Knowledge and Data Engineering},
year = {2018},
publisher = {IEEE}
}