stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
https://crfm.stanford.edu/helm
Apache License 2.0
1.96k stars 254 forks source link

Cohere models are failing during windowing #1371

Closed yifanmai closed 3 months ago

yifanmai commented 1 year ago
  File "/nlp/u/maiyifan/miniconda3/envs/helm-v0.2.1-rc1/lib/python3.8/site-packages/helm/benchmark/adaptation/adapters/language_modeling_adapter.py", line 122, in generate_requests
    prompt_text, num_conditioning_tokens = self.construct_language_modeling_prompt(
  File "/nlp/u/maiyifan/miniconda3/envs/helm-v0.2.1-rc1/lib/python3.8/site-packages/helm/benchmark/adaptation/adapters/language_modeling_adapter.py", line 173, in construct_language_modeling_prompt
    raw_prompt, pred_tokens = self.fits_tokens_within_context_window(
  File "/nlp/u/maiyifan/miniconda3/envs/helm-v0.2.1-rc1/lib/python3.8/site-packages/helm/benchmark/adaptation/adapters/language_modeling_adapter.py", line 245, in fits_tokens_within_context_window
    raise ValueError(
ValueError: Truncating pred_tokens to fit them in the context window, got len(pred_tokens) == 0, which will lead to an infinite loop.
yifanmai commented 1 year ago

Seems to break on the_pile:subset=ArXiv, the_pile:subset=Github and the_pile:subset=PubMed Central. This is a pre-existing issue from v0.1.0 (which has missing results from these runs with Cohere).

yifanmai commented 1 year ago

Now I'm getting:

helm.benchmark.executor.ExecutorError: Failed to make request to cohere after retrying 8 times. Error: CohereClient error: Request failed with error too many tokens: total number of tokens (prompt and prediction) cannot exceed 2048 - received 2049. Try using a shorter prompt, a smaller max_tokens value, or enabling prompt truncating. See https://docs.cohere.ai/reference/generate for more details. Request: Request(model='cohere/command-medium-nightly', embedding=False, prompt='Title: Stephen Colbert\n\nBackground: Colbert was born in Washington, D.C., the youngest of 11 children in a Catholic family. He spent his early years in Bethesda, Maryland. He grew up on James Island, South Carolina. Colbert and his siblings, in descending order by age, are James III, Edward, Mary, William, Margo, Thomas, Jay, Elizabeth, Paul, Peter, and Stephen.\n\nSection: The Colbert Report\nPassage: While at Northwestern, Colbert studied with the intent of becoming a dramatic actor; mostly he performed in experimental plays and was uninterested in comedy. He began performing improvisation while in college, both in the campus improv team No Fun Mud Piranhas and at the Annoyance Theatre in Chicago as a part of Del Close\'s ImprovOlympic at a time when the project was focused on competitive, long-form improvisation, rather than improvisational comedy. "I wasn\'t gonna do Second City", Colbert later recalled, "because those Annoyance people looked down on Second City because they thought it wasn\'t pure improv - there was a slightly snobby, mystical quality to the Annoyance people". After Colbert graduated in 1986, however, he was in need of a job. A friend who was employed at Second City\'s box office offered him work answering phones and selling souvenirs. Colbert accepted and discovered that Second City employees were entitled to take classes at their training center for free. Despite his earlier aversion to the comedy group, he signed up for improvisation classes and enjoyed the experience greatly. Shortly thereafter, he was hired to perform with Second City\'s touring company, initially as an understudy for Steve Carell. It was there he met Amy Sedaris and Paul Dinello, with whom he often collaborated later in his career. By their retelling, the three comedians did not get along at first - Dinello thought Colbert was uptight, pretentious and cold, while Colbert thought of Dinello as "an illiterate thug" - but the trio became close friends while touring together, discovering that they shared a similar comic sensibility. When Sedaris and Dinello were offered the opportunity to create a television series for HBO Downtown Productions, Colbert left The Second City and relocated to New York to work with them on the sketch comedy show Exit 57. The series debuted on Comedy Central in 1995 and aired through 1996. Although it lasted for only 12 episodes, the show received favorable reviews and was nominated for five CableACE Awards in 1995, in categories including best writing, performance, and comedy series. Following the cancelation of Exit 57, Colbert worked for six months as a cast member and writer on The Dana Carvey Show, alongside former Second City castmate Steve Carell, and also Robert Smigel, Charlie Kaufman, Louis C.K., and Dino Stamatopoulos, among others. The series, described by one reviewer as "kamikaze satire" in "borderline-questionable taste", had sponsors pull out after its first episode aired and was cancelled after seven episodes. Colbert then worked briefly as a freelance writer for Saturday Night Live with Robert Smigel. Smigel brought his animated sketch, The Ambiguously Gay Duo, to SNL from The Dana Carvey Show; Colbert provided the voice of Ace on both series, opposite Steve Carell as Gary. Needing money, he also worked as a script consultant for VH1 and MTV, before taking a job filming humorous correspondent segments for Good Morning America. Only two of the segments he proposed were ever produced and only one aired, but the job led his agent to refer him to The Daily Show\'s then-producer, Madeline Smithberg, who hired Colbert on a trial basis in 1997. During the same period, Colbert worked again with Sedaris and Dinello to develop a new comedy series for Comedy Central, Strangers with Candy. Comedy Central picked up the series in 1998 after Colbert had already begun working on The Daily Show. As a result, he accepted a reduced role, filming only around 20 Daily Show segments a year while he worked on the new series. Strangers with Candy was conceived of as a parody of after school specials, following the life of Jerri Blank, a 46-year-old dropout who returns to finish high school after 32 years of life on the street. Most noted by critics for its use of offensive humor, it concluded each episode by delivering to the audience a skewed, politically incorrect moral lesson. Colbert served as a main writer alongside Sedaris and Dinello, and portrayed Jerri\'s strict but uninformed history teacher, Chuck Noblet, seen throughout the series dispensing inaccurate information to his classes. Colbert has likened this to the character he played on The Daily Show and later The Colbert Report, claiming that he has a very specific niche in portraying "poorly informed, high-status idiot" characters. Another running joke throughout the series was that Noblet, a closeted homosexual, was having a "secret" affair with fellow teacher Geoffrey Jellineck, despite the fact that their relationship was apparent to everyone around them. This obliviousness also appears in Colbert\'s Daily Show and Colbert Report character. Thirty episodes of Strangers with Candy were made, which aired on Comedy Central in 1999 and 2000. Though its ratings were not remarkable during its initial run, it has been characterized as a cult show with a small but dedicated audience. Colbert reprised his role for a film adaptation, which premiered at the Sundance Film Festival in 2005 and had a limited release in 2006. The film received mixed reviews. Colbert also co-wrote the screenplay with Sedaris and Dinello. Colbert hosted his own television show, The Colbert Report, from October 17, 2005, through December 18, 2014. The Colbert Report was a Daily Show spin-off that parodied the conventions of television news broadcasting, particularly cable-personality political talk shows like The O\'Reilly Factor, Hannity, and Glenn Beck. Colbert hosted the show in-character as a blustery right-wing pundit, generally considered to be an extension of his character on The Daily Show. Conceived by co-creators Stewart, Colbert, and Ben Karlin in part as an opportunity to explore "the character-driven news", the series focused less on the day-to-day news style of the Daily Show, instead frequently concentrating on the foibles of the host-character himself. The concept for The Report was first seen in a series of Daily Show segments which advertised the then-fictional series as a joke. It was later developed by Stewart\'s Busboy Productions and pitched to Comedy Central, which green-lighted the program; Comedy Central had already been searching for a way to extend the successful Daily Show franchise beyond a half-hour. The series opened to strong ratings, averaging 1.2 million viewers nightly during its first week on the air. Comedy Central signed a long-term contract for The Colbert Report within its first month on the air, when it immediately established itself among the network\'s highest-rated shows. Much of Colbert\'s personal life was reflected in his character on The Colbert Report. With the extended exposure of the character on the show, he often referenced his interest in and knowledge of Catholicism, science fiction, and The Lord of the Rings, as well as using real facts to create his character\'s history. His alternate persona was also raised in South Carolina, is the youngest of 11 siblings and is married. The actual Colbert\'s career history in acting and comedy, however, was often downplayed or even denied outright, and he frequently referred to having attended Dartmouth College (which was at the forefront of the conservative campus movement in the 1980s) rather than his actual alma mater, Northwestern. In July 2012, Colbert added two years to his contract with Comedy Central, extending the run of The Colbert Report until the end of 2014. The final episode on December 18, 2014, featured a rendition of "We\'ll Meet Again" and appearances from former guests of the show, including Jon Stewart, Randy Newman, Bryan Cranston, Willie Nelson, Yo-Yo Ma, Mandy Patinkin, Neil deGrasse Tyson, Tom Brokaw, David Gregory, J. J. Abrams, Big Bird, Gloria Steinem, Ken Burns, James Franco, Barry Manilow, Bob Costas, Jeff Daniels, Sam Waterston, Bill de Blasio, Katie Couric, Patrick Stewart, George Lucas, Henry Kissinger, Cookie Monster, Alan Alda, Eliot Spitzer, Vince Gilligan, Paul Krugman, and a text from Bill Clinton, and appearances by Alex Trebek, U.S. and coalition Afghanistan forces, and further characters (a space station astronaut, Santa, Abraham Lincoln, etc.).\n\nQuestion: what was the colbert report about?\nAnswer: Colbert has likened this to the character he played on The Daily Show and later The Colbert Report,\n\nQuestion: what did the colbert report contain?\nAnswer: Another running joke throughout the series was that Noblet, a closeted homosexual,\n\nQuestion: what influence did his report have?\nAnswer: The series opened to strong ratings, averaging 1.2 million viewers nightly during its first week on the air.\n\nQuestion: what is the interesting aspect of this report?\nAnswer: Following the cancelation of Exit 57, Colbert worked for six months as a cast member and writer on The Dana Carvey Show,', temperature=0.0, num_completions=1, top_k_per_token=1, max_tokens=100, stop_sequences=['\n'], echo_prompt=False, top_p=1, presence_penalty=0, frequency_penalty=0, random=None)
yifanmai commented 1 year ago

The total number of tokens error turned out to be a different issue (using stale tokenizer results after the tokenizer version changed).

enor2017 commented 1 year ago

Hi, may I know whether there is any update on this issue? I encountered the same problem when running the_pile:subset=ArXiv using vicuna model, but not on the_pile:subset=Github.

yifanmai commented 3 months ago

@enor2017 Unfortunately the_pile is deprecated because there is no longer a reliable host for that dataset.

yifanmai commented 3 months ago

Closing as wontfix because we no longer perform language modeling evaluations on Cohere models.