Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"
Arxiv link of the paper: https://arxiv.org/abs/2105.07148
If any questions, please contact the email: willie1206@163.com
CoNLL format (prefer BIOES tag scheme), with each character its label for one line. Sentences are splited with a null line.
美 B-LOC
国 E-LOC
的 O
华 B-PER
莱 I-PER
士 E-PER
我 O
跟 O
他 O
谈 O
笑 O
风 O
生 O
Chinese BERT: https://huggingface.co/bert-base-chinese/tree/main
Word Embedding: https://ai.tencent.com/ailab/nlp/en/data/Tencent_AILab_ChineseEmbedding.tar.gz
The original download link does not work. We update it as:
Word Embedding: https://ai.tencent.com/ailab/nlp/en/data/tencent-ailab-embedding-zh-d200-v0.2.0.tar.gz
More info refers to: Tencent AI Lab Word Embedding
1.Convert .char.bmes file to .json file, python3 to_json.py
2.run the shell, sh run_demo.sh
My model is trained in distribution mode so it can not be directly loaded by single-GPU mode. You can follow the below steps to revise the transformers before load my checkpoints.
Enter the source code director of Transformer, cd source/transformers-master
Find the modeling_util.py, and positioned to about 995 lines
change the code as follows:
Compile the revised source code and install. python3 setup.py install
@inproceedings{liu-etal-2021-lexicon,
title = "Lexicon Enhanced {C}hinese Sequence Labeling Using {BERT} Adapter",
author = "Liu, Wei and
Fu, Xiyan and
Zhang, Yue and
Xiao, Wenming",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.454",
doi = "10.18653/v1/2021.acl-long.454",
pages = "5847--5858"
}