Closed artidoro closed 2 years ago
cc @gante as well
Hi @artidoro π Thank you for raising this issue!
There are actually two distinct problems, the first one was already on my radar:
length_penalty
is only used with beam_search
-based generation techniques. facebook/bart-large-cnn
uses them by default, and gpt2
doesn't. So, in fact, length_penalty
has no effect on gpt2
, the different results you're seeing are a consequence of sampling being on by default for gpt2
(all these hidden defaults are also going through a deprecation phase π ) π solution: raise warnings/exceptions when these options have no effect (already being worked on)length_penalty
-> larger denominator, increasing with output length -> larger score (because it is a negative value), increasing with output length -> benefits long outputs π solution: fix the docstring (@patrickvonplaten FYI)I'll keep this issue open until the 2nd problem gets fixed.
Confirming point 2.) @gante we could directly fix this here: https://github.com/huggingface/transformers/blob/06d1ba1a55a12b3fb3ca081bdd4f812fda800c37/src/transformers/generation_beam_search.py#L140 as well.
System Info
transformers
version: 4.20.1Who can help?
@patrickvonplaten
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
length_penalty
in language generation has different effects on the the length of the generation. Sometimes it makes the generation longer, sometimes it makes it shorter. This is very confusing as it is different from what the documentation says. Two previous issues touch on this problem: #4915 #16930In Bart CNN/DM
length_penalty
lengthens the output.Output:
[{'summary_text': 'Liana Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men, and at one time, she was married to eight men at once.'}]
[{'summary_text': 'Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.'}]
In GPT-2 increasing
length_penalty
shortens the output.Output:
[{'generated_text': 'The White man worked as a receptionist for the British Consulate in Cairo and returned to Alexandria, where he was promoted to a military officer in 1953; in 1960 he worked as a consular officer, serving as secretary of state to President John F. Kennedy, and as a consul. In a conversation last fall, his grandfather told his sister Catherine, "We are going to make sure you are well."\n\nThe family is now living in a modest apartment, in a small part of town in the suburb of Alexandria.\n\n"We love you, and we love you," Catherine said, before she walked the five miles to the airport, where her husband, the first Egyptian president, has a $1 million plane ticket. The couple are still in touch with their three children, and will visit one next week.\n\nIn addition to the family, there are three other family members, one of whom has spent years as a caretaker for the hospital, which was the site of the largest civil conflict ever seen in modern Egypt. One was a nurse and family friend, who was paralyzed in a July 1975 accident.\n\n"It\'s just unbelievable," he told a reporter.\n\nThe funeral for one of the women who took her life last summer was held Wednesday at a church in the town of Dikun.\n\nIn his own words, the young woman\'s death marks a departure from his life.\n\n"I don\'t know if people would say I\'m the most important person in the world: I\'m the most beautiful person," he said. "But I did, but I will never forget that."'}]
[{'generated_text': "The White man worked as a mechanic.\n\nHe is said to have been very close with the White man's wife and three children. Other information came through during the early years of the investigation.\n\nPolice said they had asked the man to tell his story to police in order to gain information related to the white man's death.\n\nA source close to the father said the motive for the killings is still being investigated and the suspect was not a white man."}]
Expected behavior
Effect of
length_penalty
to be consistent with documentation.Currently the documentation says: "Exponential penalty to the length. 1.0 means that the beam score is penalized by the sequence length. 0.0 means no penalty. Set to values < 0.0 in order to encourage the model to generate longer sequences, to a value > 0.0 in order to encourage the model to produce shorter sequences."