This project includes the source code for the paper SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization, appearing at ACL 2020.
Highlighted Features
Contact person: Yang Gao, yang.gao@rhul.ac.uk
https://sites.google.com/site/yanggaoalex/home
Don't hesitate to send us an e-mail or report an issue, if something is broken or if you have further questions
Given the source documents and some to-be-evaluated summaries, you can produce the unsupervised metrics for the summaries with the code below:
from ref_free_metrics.supert import Supert
from utils.data_reader import CorpusReader
# read docs and summaries
reader = CorpusReader('data/topic_1')
source_docs = reader()
summaries = reader.readSummaries()
# compute the Supert scores
supert = Supert(source_docs, ref_metric='top15')
scores = supert(summaries)
In the example above, it extracts the top-15 sentences from each source document to build the pseudo reference summaries, and rate the summaries by measuring their semantic similarity with the pseudo references.
You could use the same code for evaluating multi-doc summaries to rate single-doc summaries. In addition to that, you may consider using more sentences from the input doc to build the pseudo reference, by replacing argument 'top15' in the above code by, e.g., 'top30', so as to use the first 30 (instead of 15) sentences to build the pseudo reference.
We study the influence of pseudo reference length on the performance of Supert for single-doc summaries at summ_eval. We compare correlation between Supert and human ratings from the SummEval dataset.
You can also use the unsupervised metrics as rewards to train a RL-based summarizer to generate summaries:
from generate_summary_rl import RLSummarizer
# read source documents
reader = CorpusReader('data/topic_1')
source_docs = reader()
# generate summaries using reinforcement learning, with supert as reward function
supert = Supert(source_docs)
rl_summarizer = RLSummarizer(reward_func = supert)
summary = rl_summarizer.summarize(source_docs, summ_max_len=100)
# print out the generated summary
print(summary)
You can also use the unsupervised metrics as the fitness function to guide a genetic algorithm to search for the optimal summary. See the example provided in generate_summary_ga.py.
If human-written reference summaries are available (assume they are at data/topic_1/references), you can also evaluate the quality of the generated summary against the references using ROUGE:
refs = reader.readReferences()
for ref in refs:
rouge_scores = evaluate_summary_rouge(summary, ref)
pip3 install -r requirements.txt
mv ROUGE-RELEASE-1.5.5 rouge/
Apache License Version 2.0