Code for our ACL2019 paper Reliability-aware Dynamic Feature Composition for Name Tagging.
Note:
train.tsv
, dev.tsv
, test.tsv
, token.vocab.tsv
, char.vocab.tsv
, and label.vocab.tsv
in their directories.*.vocab.tsv
from a merged data set of all subsets.The following functions in proprocess.py
can be used to create vocab and frequency files.
build_all_vocabs
takes as input a list of CoNLL format files, and generate {token,char,label}.vocab.tsv
in output_dir
.build_embed_vocab
takes a pre-trained embedding file as input and return the embedding vocab.build_embed_token_count
takes a pre-trained embedding file as input and generate an embedding token frequency file.python train_lstmcnn_all.py -d 0 -i <input_dir> -o <output_dir> -e <embedding_file>
--embed_vocab <embedding_vocab_file> --char_dim 50 --seed <random_seed>
This script train a model for each subset (which can be specified with the --datasets
argument) and report within-subset (within-genre) and cross-subset (cross-genre) performance.
python train_lstmcnn_dfc_all.py -d 0 -i <input_dir> -o <output_dir> -e <embedding_file>
--embed_vocab <embedding_vocab_file> --embed_count <embedding_freq_file> --char_dim 50 --seed <random_seed>
Lin, Y., Liu, L., Ji, H., Yu, D., Han, J. (2019) Reliability-aware Dynamic Feature Composition for Name Tagging. Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics.
@article{lin2019reliability,
title={Reliability-aware Dynamic Feature Composition for Name Tagging},
author={Lin, Ying and Liu, Liyuan and Ji, Heng and Yu, Dong and Han, Jiawei},
booktitle={Proceedings of The 57th Annual Meeting of the Association for Computational Linguistics (ACL2019)},
year={2019}
}