redreamality / RERE-relation-extraction

code for paper Revisiting the Negative Data of Distantly Supervised Relation Extraction
20 stars 3 forks source link

Revisiting the Negative Data of Distantly Supervised Relation Extraction

PWC PWC PWC

This repository contains the source code and dataset for the paper: Revisiting the Negative Data of Distantly Supervised Relation Extraction. Chenhao Xie, Jiaqing Liang, Jingping Liu, Chengsong Huang, Wenhao Huang, Yanghua Xiao. ACL 2021. paper

How to reproduce

Install all the dependencies in requirements.txt.

Download the BERT-related files and follow the instructions in tfhub/*/readme.md Runrere/bert-to-h5.py to producebert_uncased.h5 and chinese_roberta_wwm_ext.h5.

The models: ReRe and ReRe_LSTM, in table 3 are provided for reproducing in the dictionary rere and rere_lstm. extraction.py is the main file. If you want to train the model, you may use cmd python extraction.py {data_set_name} train. You can also load the model and predict by the cmd python extraction.py {data_set_name},for example python extraction.py NYT11-HRL. We can provide the pre-trained model for reproducing exactly the same result as in the paper.

The data set in Figure 3 are provided in data/FNexp.The data sets are generated by data/FN_data_gen.py. You can use the cmdpython extraction.py FNexp/{data_set_name}@{radio} train, for example python extraction.py FNexp/ske2019@0.1,to train the corresponding model.

Datasets

Datasets are provided separately in this repo. Including two new datasets NYT21 and SKE21 (the labeled testset of SKE2019).

Usage and troubleshooting

The package bert4keras that we provided in ./rere/BERT_TF2 can alternatively be installed via pip, but we don't guarantee that its latest version works with our code, if trouble happens, please run pip uninstall bert4keras. If pretrained the models are needed for reproduce, please contact the authors. We are willing to provide them.

Environments detail

NVIDIA-SMI 455.23.04

Driver Version: 455.23.04

CUDA Version: 11.1

GeForce RTX 3090

Python 3.7.9

requirements.txt are provided for installing the virtual environment in conda.

Citation

@inproceedings{xie2021revisiting,
  title={Revisiting the Negative Data of Distantly Supervised Relation Extraction},
  author={Xie, Chenhao and Liang, Jiaqing and Liu, Jingping and Huang, Chengsong and Huang, Wenhao and Xiao, Yanghua},
  booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics},
  year={2021}
}