violet-zct / fairseq-detect-hallucination

Detect hallucinated tokens for conditional sequence generation.
MIT License
63 stars 5 forks source link

Detecting Hallucinated Content in Conditional Neural Sequence Generation

This repository contains codes for running hallucination detection from the following paper.

Detecting Hallucinated Content in Conditional Neural Sequence Generation
Chunting Zhou, Graham Neubig, Jiatao Gu, Mona Diab, Paco Guzman, Luke Zettlemoyer, Marjan Ghazvininejad
ACL-Finding 2021


Model

Requirement

Under your anoconda environment, please install fairseq from source locally with:

python setup.py build_ext --inplace

We will explain to you how to train a hallucination model on your own bi-text dataset and make predictions.

Data

1. Training data used in the paper

We used the large multi-domain data set collected in this paper (Wang et al., 2020), which includes four domains (law, news, patent, tvsubtitles). Since it involves the data from LDC, we could not publish it.

2. Human annotation benchmarks

We have two benchmark datasets for MT and summarization (XSum) respectively in this repo (./eval_data/).

Machine Translation (./eval_data/mt/):

We train two MT systems (standard Transformer and finetuned MBART) on the simulated low-resource (patent domain) training data, and evaluate on the patent domain. We ask bilingual speakers to evaluate if machine translations contain hallucinations at token-level on 150 sentences from the patent test set. Under ./eval_data/mt/, *source are raw source sentences, *target are model outputs, *ref are references, *label are annotated labels of *target. 1 indicates hallucinated words and 0 indicates faithful translation words.

Create Synthetic Data

To train a hallucination prediction model on your own bi-text dataset, the first step is creating the synthetic labeled data. This is decomposed into the following two sub-steps.

Train a Hallucination Detection Model

You can finetune XLM-R or Roberta with the above created binarized data. We provide the batch scripts to run this for MT and abstractive summarization respectively.

sbatch ./train_exps/example_finetune_mt.sh path/to/the/binarized/data

or

sbatch ./train_exps/example_finetune_xsum.sh path/to/the/binarized/data

You may want to tune the hyperparameters inside the scripts for better performance, such as --dropout-ref (dropout reference words to prevent the model from learning edit distance), --max-update, etc.

Evaluation

We provide the evaluation scripts for the benchmark datasets under ./eval_data. To evaluate on these datasets, we provide python scripts ./util_scripts/eval_predict_hallucination_mt.py and ./util_scripts/eval_predict_hallucination_xsum.py for MT and summarization respectively (they only differ slightly). First, you need to specify the path to the saved detection model directory and training data path in Line 12-13, then run them.

Pretrained Models

You can download our trained models for these benchmark datasets for zhen-MT and XSum, and evalutate them with the above scripts by first setting the models to be ['path/to/the/unzipped/folder'] and datapath to be the folder of data inside the unzipped file.

Prediction

To simply use the trained model for hallucination prediction for your own input, we provide an example script ./util_scripts/predict_hallucination_mt.py that predicts labels for a hypothesis file conditioned on its source file. Again, please specify the path to your input files, the trained model, the training data and the output directory in Line 12-23, and then run it.

Scripts for Word-level Quality Estimation

The directory word_level_qe/ contains scripts for both supervised and unsupervised experiments for word-level quality estimation from the WMT18 shared task (task 2 of QE).

Reference

@inproceedings{zhou21aclfindings,
    title = {Detecting Hallucinated Content in Conditional Neural Sequence Generation},
    author = {Chunting Zhou and Graham Neubig and Jiatao Gu and Mona Diab and Francisco Guzmán and Luke Zettlemoyer and Marjan Ghazvininejad},
    booktitle = {Findings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP Findings)},
    address = {Virtual},
    month = {August},
    url = {https://arxiv.org/abs/2011.02593},
    year = {2021}
}