huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.69k stars 26.44k forks source link

Pegasus- Arxiv predicts random text #7163

Closed MichaelJanz closed 3 years ago

MichaelJanz commented 4 years ago

Environment info

Who can help

@sshleifer

Information

Model I am using (Pegasus-Arxiv):

The problem arises when using:

To reproduce

Steps to reproduce the behavior:

  1. Download the pegasus-arxiv model
  2. Use the sample script below
  3. You will get the result below:
import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer

src_text ="""In this sequel to the phenomenally popular Harry Potter and the Sorcerer’s Stone, Harry returns to Hogwarts School of Witchcraft and Wizardry for his second year after a miserable summer with his Muggle (nonmagical) relatives. Once again, Harry’s school experiences are colored by encounters with genial ghosts and antagonistic teachers, by the rivalry between good-guy Gryffindor House and slimy Slytherin House, and by an ominous mystery to be solved involving Harry’s archenemy, the dark sorcerer Lord Voldemort. Once again, the attraction of Rowling’s traditional British school story is magnified tenfold by the fantasy elements superimposed upon it. The atmosphere Rowling creates is unique; the story whizzes along; Harry is an unassuming and completely sympathetic hero. But, truth to tell, you may feel as if you’ve read it all before. Rowling clearly hit on a winning formula with the first Harry Potter book; the second book — though still great fun — feels a tad, well, formulaic."""

model_name = 'google/pegasus-arxiv'
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

batch = tokenizer.prepare_seq2seq_batch([src_text], truncation=True, padding='longest').to(torch_device)
translated = model.generate(**batch)
tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
print(tgt_text)

Expected behavior

I expect a clear summary of the text, but I receive a text with no connection to the input, written as a scientific paper:

['this is the first of a series of papers in which we address the question of whether or not the laws of thermodynamics are valid in the limit of infinitely many degrees of freedom. we show that the laws of thermodynamics are valid in the limit of infinitely many degrees of freedom. this is the first of a series of papers in which we address the question of whether or not the laws of thermodynamics are valid in the limit of infinitely many degrees of freedom. we show that the laws of thermodynamics are valid in the limit of infinitely many degrees of freedom. [ theorem]acknowledgement [ theorem]algorithm [ theorem]axiom [ theorem]claim [ theorem]conclusion [ theorem]condition [ theorem]conjecture [ theorem]corollary [ theorem]criterion [ theorem]definition [ theorem]example [ theorem]exercise [ theorem]lemma [ theorem]notation [ theorem]problem [ theorem]proposition [ theorem]remark [ theorem]solution [ theorem]summary this is the first of a series of papers in which we address the question of whether or not the laws of thermodynamics are valid in the limit of infinitely many degrees of freedom.']

Am I doing something wrong or is it the model? Thanks

Skyy93 commented 4 years ago

Seems to be a problem with the 'google/pegasus-arxiv' model, when you use 'google/pegasus-xsum' you get: Harry Potter and the Philosopher’s Stone is the seventh and final book in JK Rowling’s Harry Potter series. as output

MichaelJanz commented 4 years ago

Yes I tried different pegasus models (including alot of other models) and pegasus-large e.G. outputs this (which I think is really good1): In this sequel to the phenomenally popular Harry Potter and the Sorcerer’s Stone, Harry returns to Hogwarts School of Witchcraft and Wizardry for his second year after a miserable summer with his Muggle (nonmagical) relatives. Rowling clearly hit on a winning formula with the first Harry Potter book; the second book — though still great fun — feels a tad, well, formulaic.'

while pegasus-multinews outputs pretty well generated texts, but unfortunately wrong in the content: – The seventh and final book in the Harry Potter series, Harry Potter and the Sorcerer\'s Stone, is out today. The sixth book in the series, Harry Potter and the Deathly Hallows, was released in the US in advance of tomorrow\'s release in the UK. Here\'s what critics are saying about the seventh and final book in the series: The plot is still compelling, but the book "feels a tad, well, formulaic," writes James Poniewozik in Time. "The atmosphere Rowling creates is unique; the story whizzes along; Harry is an unassuming and completely sympathetic hero. But, truth to tell, you may feel as if you\'ve read it all before. Rowling clearly hit on a winning formula with the first Harry Potter book; the second book—though still great fun—feels a tad, well, formulaic."'

Gigaword and billsum are both also outputting non useful texts.

Also another question, while pegasus-large and pegasus-cnn_dailymail both only return the most important sentences, pegasus-multinews generates even new text. I was hoping the same for the arxiv model, is there a reason that it differs in that way?

sshleifer commented 4 years ago

pegasus-arxiv is trained on and expects scientific text. pegasus-multinews expects news I presume.

If you want to prove a bug, try running an evaluation on a public dataset from the datasets package, and posting the result #6844 .

DavideStenner commented 3 years ago

Environment info

transformers version: 3.1.0 Platform: Windows - 10 Python version: 3.7.6 PyTorch version (GPU?): 1.5.0 (False) Using GPU in script?: no Using distributed or parallel set-up in script?: no

To Reproduce

I found unexpected behaviour when using Pegasus-Pubmed on Pubmed document.

import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer

src_text ="""although the association is modest , it is important because of the increasing prevalence of metabolic syndrome and the effect that depression can have on the ability of patients to successfully make lifestyle changes and comply with medication required for hypertension and dyslipidemia . the association is demonstrated here in a general population to our knowledge for the first time , whereas earlier studies ( table 1 ) used subgroups of populations ( 813,17 ) . this distinction is important because many individuals with metabolic syndrome have diabetes , which itself is known to be associated with depression ( 5 ) . metabolic syndrome has been defined in several ways that involve quantitative anthropometric , clinical , and laboratory measurements ( 1,2 ) . for the primary assessment , we chose ncep atp iii ( 1 ) criteria , since these criteria were used in most of the previously reported studies ( 8,9,1113,17 ) ."""

model_name = 'google/pegasus-pubmed'
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

batch = tokenizer.prepare_seq2seq_batch([src_text], truncation=True, padding='longest').to(torch_device)
translated = model.generate(**batch)
tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
print(tgt_text)

Expected behaviour

I expect a summary of the input but i received a longer (relative to the input) version of the text, input has length of 929 vs 1129 of predicted summary. In particular Pegasus generate new knowledge (bold text) which isn't inside the input text.

Input:

although the association is modest , it is important because of the increasing prevalence of metabolic syndrome and the effect that depression can have on the ability of patients to successfully make lifestyle changes and comply with medication required for hypertension and dyslipidemia . the association is demonstrated here in a general population to our knowledge for the first time , whereas earlier studies ( table 1 ) used subgroups of populations ( 813,17 ) . this distinction is important because many individuals with metabolic syndrome have diabetes , which itself is known to be associated with depression ( 5 ) . metabolic syndrome has been defined in several ways that involve quantitative anthropometric , clinical , and laboratory measurements ( 1,2 ) . for the primary assessment , we chose ncep atp iii ( 1 ) criteria , since these criteria were used in most of the previously reported studies ( 8,9,1113,17 ) .

Output:

['depression is known to be associated with metabolic syndrome, but its association with metabolic syndrome has not been studied in a general population. we examined the association between depression and metabolic syndrome using ncep atp iii criteria in a population - based sample ( n = 3,018 ). metabolic syndrome was defined as having three or more of the following : body mass index 25 kg / m2, waist circumference 90 cm, and triglyceride 130 mg / dl. depression was assessed using the center for epidemiologic studies depression scale ( cesds ). multivariate logistic regression was used to estimate odds ratios ( ors ) and 95% confidence intervals ( cis ) for the association between depression and metabolic syndrome. we found a significant association between depression and metabolic syndrome in a general population. after adjustment for age, sex, race / ethnicity, education, smoking, physical activity, alcohol intake, and body mass index, metabolic syndrome was associated with increased odds of depression ( or = 1.16, 95% ci 1.041.32 ). the association was stronger in women than in men.']

Is that behaviour correct?

sshleifer commented 3 years ago

Output should be < 256 tokens (not characters). Input should probably be longer (closer to 1024 tokens). Try copying something from the leftmost column of the dataset

sshleifer commented 3 years ago

We've now replicated that our pegasus port performs similarly well to the authors implementation on 11 datasets, including arxiv.

image

Link to Spreadsheet

sshleifer commented 3 years ago

https://docs.google.com/spreadsheets/d/1ODfoK-tXOV6TLXDMnujdGLtFhA8oVTy-Cv6Ib6qKgWk/edit#gid=0