xcfcode / PLM_annotator

Codes for our ACL21 paper: Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization
94 stars 7 forks source link

Rouge Score #2

Open Hannibal046 opened 3 years ago

Hannibal046 commented 3 years ago

Hello. Could you please tell the rouge score in your paper is r, p or f? And for bart baseline , did you do any preprocessing like deleting \n\r or add some special tokens ? thanks

xcfcode commented 3 years ago

Hi, we use F-score for ROUGE. For baseline data, I upload them to here, under the clean_samsum dir.

Hannibal046 commented 3 years ago

Thanks for your reply. When reproducing your results, I find that rouge1 and 2 can exactly match your report, but rouge-l is 1 point lower, there must be something wrong with my metric_testing function, could you please help me ?

the hyps.txt and golden.txt are from this repository

BTW, wouldn't the lowercase of name in your golden.txt hurt the rouge score since the names in your output are uppercased ?

image
xcfcode commented 3 years ago

Hi, py-rouge does lowercase inside the pkg: https://github.com/Diego999/py-rouge/blob/16f225b1f46e9d382f1bf5170546da218ee98003/rouge/rouge.py#L697 and I provide the test code, have you tried this one? https://github.com/xcfcode/PLM_annotator/blob/main/bart/py_rouge_test.py

Hannibal046 commented 3 years ago

Hi, I got it ! I used your test_code and get the results. But in py-rouge, the hyps and refs used \n, and I tried your example , get following results, is this more accurate? Thanks

image
Hannibal046 commented 3 years ago

And it seems a bug in py-rouge to set weight_factor in terms of rouge-l,https://github.com/Diego999/py-rouge/issues/10

xcfcode commented 3 years ago

Thanks for your information, how do you get the above results, could you please show me some details, the rouge-l score looks much higher.

xcfcode commented 3 years ago

On my side, I just run python py_rouge_test.py -c summaries/samsum.txt

Hannibal046 commented 3 years ago

you can simply add

candidates = [x.replace(' . ',' . \n') for x in candidates]
references = [x.replace(' . ',' . \n') for x in references]

in your py_rouge_test.py,and you can get a higher rouge-L. below is the official code example from py-rouge, we can notice that every sentence is appended with a \n

import rouge

def prepare_results(p, r, f):
    return '\t{}:\t{}: {:5.2f}\t{}: {:5.2f}\t{}: {:5.2f}'.format(metric, 'P', 100.0 * p, 'R', 100.0 * r, 'F1', 100.0 * f)

for aggregator in ['Avg', 'Best', 'Individual']:
    print('Evaluation with {}'.format(aggregator))
    apply_avg = aggregator == 'Avg'
    apply_best = aggregator == 'Best'

    evaluator = rouge.Rouge(metrics=['rouge-n', 'rouge-l', 'rouge-w'],
                           max_n=4,
                           limit_length=True,
                           length_limit=100,
                           length_limit_type='words',
                           apply_avg=apply_avg,
                           apply_best=apply_best,
                           alpha=0.5, # Default F1_score
                           weight_factor=1.2,
                           stemming=True)

    hypothesis_1 = "King Norodom Sihanouk has declined requests to chair a summit of Cambodia 's top political leaders , saying the meeting would not bring any progress in deadlocked negotiations to form a government .\nGovernment and opposition parties have asked King Norodom Sihanouk to host a summit meeting after a series of post-election negotiations between the two opposition groups and Hun Sen 's party to form a new government failed .\nHun Sen 's ruling party narrowly won a majority in elections in July , but the opposition _ claiming widespread intimidation and fraud _ has denied Hun Sen the two-thirds vote in parliament required to approve the next government .\n"
    references_1 = ["Prospects were dim for resolution of the political crisis in Cambodia in October 1998.\nPrime Minister Hun Sen insisted that talks take place in Cambodia while opposition leaders Ranariddh and Sam Rainsy, fearing arrest at home, wanted them abroad.\nKing Sihanouk declined to chair talks in either place.\nA U.S. House resolution criticized Hun Sen's regime while the opposition tried to cut off his access to loans.\nBut in November the King announced a coalition government with Hun Sen heading the executive and Ranariddh leading the parliament.\nLeft out, Sam Rainsy sought the King's assurance of Hun Sen's promise of safety and freedom for all politicians.",
                    "Cambodian prime minister Hun Sen rejects demands of 2 opposition parties for talks in Beijing after failing to win a 2/3 majority in recent elections.\nSihanouk refuses to host talks in Beijing.\nOpposition parties ask the Asian Development Bank to stop loans to Hun Sen's government.\nCCP defends Hun Sen to the US Senate.\nFUNCINPEC refuses to share the presidency.\nHun Sen and Ranariddh eventually form a coalition at summit convened by Sihanouk.\nHun Sen remains prime minister, Ranariddh is president of the national assembly, and a new senate will be formed.\nOpposition leader Rainsy left out.\nHe seeks strong assurance of safety should he return to Cambodia.\n",
                    ]

    hypothesis_2 = "China 's government said Thursday that two prominent dissidents arrested this week are suspected of endangering national security _ the clearest sign yet Chinese leaders plan to quash a would-be opposition party .\nOne leader of a suppressed new political party will be tried on Dec. 17 on a charge of colluding with foreign enemies of China '' to incite the subversion of state power , '' according to court documents given to his wife on Monday .\nWith attorneys locked up , harassed or plain scared , two prominent dissidents will defend themselves against charges of subversion Thursday in China 's highest-profile dissident trials in two years .\n"
    references_2 = "Hurricane Mitch, category 5 hurricane, brought widespread death and destruction to Central American.\nEspecially hard hit was Honduras where an estimated 6,076 people lost their lives.\nThe hurricane, which lingered off the coast of Honduras for 3 days before moving off, flooded large areas, destroying crops and property.\nThe U.S. and European Union were joined by Pope John Paul II in a call for money and workers to help the stricken area.\nPresident Clinton sent Tipper Gore, wife of Vice President Gore to the area to deliver much needed supplies to the area, demonstrating U.S. commitment to the recovery of the region.\n"

    all_hypothesis = [hypothesis_1, hypothesis_2]
    all_references = [references_1, references_2]

    scores = evaluator.get_scores(all_hypothesis, all_references)

    for metric, results in sorted(scores.items(), key=lambda x: x[0]):
        if not apply_avg and not apply_best: # value is a type of list as we evaluate each summary vs each reference
            for hypothesis_id, results_per_ref in enumerate(results):
                nb_references = len(results_per_ref['p'])
                for reference_id in range(nb_references):
                    print('\tHypothesis #{} & Reference #{}: '.format(hypothesis_id, reference_id))
                    print('\t' + prepare_results(results_per_ref['p'][reference_id], results_per_ref['r'][reference_id], results_per_ref['f'][reference_id]))
            print()
        else:
            print(prepare_results(results['p'], results['r'], results['f']))
    print()