stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
https://crfm.stanford.edu/helm
Apache License 2.0
1.9k stars 244 forks source link

AI21 tokenization error for NewsQA #349

Closed teetone closed 2 years ago

teetone commented 2 years ago

Run "news_qa:model=ai21/j1-jumbo,data_augmentation=all": {status: "READY"}

    raise self._exception
  File "/u/nlp/anaconda/main/anaconda3/envs/crfm_benchmarking/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/juice/scr/nlp/crfm/benchmarking/benchmarking/src/benchmark/executor.py", line 77, in process
    result: RequestResult = self.remote_service.make_request(self.execution_spec.auth, state.request)
  File "/juice/scr/nlp/crfm/benchmarking/benchmarking/src/proxy/remote_service.py", line 47, in make_request
    RemoteService._check_response(response)
  File "/juice/scr/nlp/crfm/benchmarking/benchmarking/src/proxy/remote_service.py", line 29, in _check_response
    raise RemoteServiceError(response["error"])
proxy.remote_service.RemoteServiceError: Failed to make request to ai21 after retrying 5 times. Error: AI21 error: Request too many tokens: 1999 + 50 > 2048

1999 + 50 > 2048 error

teetone commented 2 years ago

TODO: Test after https://github.com/stanford-crfm/benchmarking/pull/311 is merged.

rishibommasani commented 2 years ago

@teetone I am guessing this is still active given you updated the title yesterday; could you briefly comment what the status is for this? Thanks!

teetone commented 2 years ago

Request:

Request(model='ai21/j1-jumbo', prompt='Passage: (CNN)   --   Deep underground on   theborder between  France and Switzerland,   the   world\'s largest  particle acceleratorcomplexwill  explorethe world onsmaller   scales   than  any human   invention   has   explored before.\n\n\n\n\n\nThe   collider\'sALICE experiment  will  look   at  how theuniverse  formed   byanalyzing   particle  collisions.\n\n\n\n\n\nTheLarge   Hadron Collider  will  look   at how   theuniverseformed  by analyzingparticle  collisions. Some  have   expressedfears   that the   project couldlead   to the Earth\'s demise   -- somethingscientists  say  will  not  happen.  Still,  skeptics have filed  suitto tryto  stop   the   project.   Iteven hasa rap  dedicated toiton   YouTube.\n\n\n\n\n\nScientistssay  thecollider   is   finally   ready  for anattempt to circulateabeam of  protons   thewhole   way  aroundthe  17-miletunnel.  The  test, which  takes   place Wednesday, isa majorstep  toward  seeing if  the   theimmense experiment will   provide new   informationabout the   waythe universe works.\n\n\n\n\n\n"It\'s   reallya generationthat   we\'ve beenlooking  forward tothis moment,  and  themoments that  willcomeafterit inparticular,"  saidBob   Cousins, deputy to thescientific leader  of  the  Compact   Muon   Solenoidexperiment,   oneof  sixexperiments   inside  the   collidercomplex.   "September  10is   ademarcation  between finishing   the   construction   andstartingtoturn   it   on,but the excitementwillonly continue  to grow."\n\n\n\n\n\nThe   colliderconsists   of a   particle   accelerator   buriedmore  than  300 feetnear  Geneva,Switzerland. About $10  billion   have gone  into the   accelerator\'sconstruction,   theparticle  detectors and the   computers,   said Katie   Yurkewicz,  spokewomanfor CERN, the  European   Organization  for   Nuclear Research,whichis   host  to  thecollider.\n\n\n\n\n\nIn thecoming months,   the collideris  expected  to  begin smashing particles into   eachother   by   sendingtwo   beams  of  protons   aroundthetunnelinoppositedirections.   It  will operate   athigher  energies  and intensities  in the nextyear, and   the  experimentscould   generate enough  data to   make a  discoveryby 2009, experts say.  Check  out   the collider complex\'s  six detectors»\n\n\n\n\n\nTesting   the   unknown\n\n\n\n\n\nExperts   say the   collider   hasthepotential to   confirm   theories  about questionsthat   physicists  have been working   onfor   decadesincludingthe   possible  existence of extra dimensions.  They   alsohope   to  find   a   theoretical  particle   called the Higgs   boson,   which   hasneverbeen  detected,  but  would help  explain  why  matterhasmass.\n\n\n\n\n\nThe   collider will   recreate the conditions  of  less   than a millionth   of asecond after the  Big   Bang, when  there  was  a hot  "soup" of   tiny   particles called  quarksand   gluons,  to   look at how the universe  evolved,   said   John   Harris,U.S. coordinatorfor   ALICE,a   detectorspecializedto   analyze   that   question.\n\n\n\n\n\nSince this isexploratory  science,  thecollidermay uncover surprises   thatcontradictprevailing   theories,  but   whichare  just as  interesting,  said Joseph   Lykken,   theoretical physicist   at  theFermi  NationalAcceleratorLaboratory.\n\n\n\n\n\n"When   Columbus sails west,   hethought he  was  going   to find  something. He   didn\'t   find   what hethought   he   was  going to  find,but  he did find   somethinginteresting,"  saidLykken, who   works  on  the CompactMuon   Solenoid,one  of six  experiments  inside   the  collider   complex.\n\n\n\n\n\nWhy should the layperson care  about   this particularexploration?  Yearsago, when  electrons  werefirst identified, no   one knew whattheywere  good for,but  they have  sincetransformed   our  entire  economy, saidHoward Gordon,deputy research   program   manager   for   the   collider\'s ATLAS experiment.\n\n\n\n\n\n"The   transformative  effect of   this  research willbe   to  understand  the  world   we   live  in   much better,"said Gordon,  at  Brookhaven   National   Laboratory. "It\'s   importantfor  just  who  weare,whatwe  are."\n\n\n\n\n\nBlack hole  fearsare  "baloney"\n\n\n\n\n\nFears   have   emergedthat  thecollidercouldproduce   black   holes  that could suck upanythingaround  them-- including   the whole  Earth. Suchfears   promptedlegal   actions  in theU.S.and   Europe  to  haltthe operation of  the Large  Hadron Collider,   alleging safety  concerns  regarding  blackholesandotherphenomena thatcouldtheoretically emerge.\n\n\n\n\n\nAlthough   physicists  acknowledge thatthe  collider   could,in  theory,  createsmall blackholes,   they say   they  do   notpose   anyrisk.   A  study released  Friday byCERN  scientistsexplainsthat   any   blackhole created   would be tiny,   and  would not haveenough  energy to stickaroundverylongbefore  dissolving.  Five collidercollaborators   whodid  notpenthe report independentlytold  CNN   there  would  beno  danger frompotential   blackholes.\n\n\n\n\n\nJohn   Huth,who  worksonthe   collider\'s   ATLASexperiment, calledsuchfears"baloney"   in arecent interview,  andnotedthat   in  normal   physics,  even if  the black   hole   werestable,it  could   just pass  through the  Earthwithout   being detected  or  withoutinteractingat   all.\n\n\n\n\n\n"The  gravitational force   is so  weak  that   you\'d  have  towait   many,  many,  many,many, many  lifetimes   of   the  universebefore   one   of  these  things  could   [get]big  enough   to   even get  closetobeinga  problem,"  said  Huth,professor  of   physicsat Harvard   University.\n\n\n\n\n\nAt the scene\n\n\n\n\n\nWhenvisiting  thegeneral-purpose   detectors CMS  and ATLAS   at   theLargeHadron   Collider,Lykken  said he  was awed   that30,000 tons   of electronics  wouldhavetowork without anyone fiddling   with   themall  thetime.\n\n\n\n\n\n"Itjust  blowsyou away  to  look  at  these   things  and realize they\'re  not  only  incredibly  complexand huge, buttheyhavetoactually work," he   said. "They  haveto   workwithout   peoplebanging  on  them  all  daybecause  they\'re sitting undergroundallby   themselves."\n\n\n\n\n\nWith  twice   as  much   ironas  theEiffel   Tower, CMSwill  runatfull   power  for  the   first  time  in  conjunction withthe first  beam   test   Wednesday,  Lykkensaid.The  magnet   servesto   bendparticles,   whizzing   by  at   almost   the   speedof  light,to  figure out   what   kind of particles   they   are.\n\n\n\n\n\nAlthoughthedetector\'s  parts  weigh   thousands of   tons,   in  previous   trials  of  CMS   at  lower   power,   the   magnetactually yanked   certain  parts   aroundbecause ofitspower,Lykken said.\n\n\n\n\n\n"You\'re talking  about such   incrediblepowerinside   both  the   accelerator   and detectors   thatyou never  really know untilyouturn  it all onwhat\'sgoingto  happen,"he said.\n\n\n\n\n\nScientists around the world   are  pumped   for   thefirst   beam.  Fermilab, the   high energyphysics  lab in Batavia,Illinois,  andmajor collaboratoron  the  Large   Hadron   Collider, will be   host   of   a"pajama party"at 1:30  a.m.CT  that   includes   a liveconnection   to  CERN   to follow the  action.\n\n\n\n\n\nCousins believesthat  because  the collider  pushes   the frontiersof   science  and  technology,   it would   be "amazinglyimpressive  ifitworks the   first try," hesaid  in a  phone interviewfrom   CERN.Any   little disturbance  of  themagnetic field anywhere  in   the tunnelcould   stop the   beam from  making   it   all   the  way  around.\n\n\n\n\n\nStill,  after  a 25-yearwait,   he\'s   notcomplaining."I  personallywill  befine  if there\'s   someproblem that  has  tobe overcome  in  the   nextfew  ', temperature=0.0, num_completions=1, top_k_per_token=1, max_tokens=50, stop_sequences=['\n'], echo_prompt=False, top_p=1, presence_penalty=0, frequency_penalty=0, random=None)

Raw prompt:

Passage: (CNN)   --   Deep underground on   theborder between  France and Switzerland,   the   world's largest  particle acceleratorcomplexwill  explorethe world onsmaller   scales   than  any human   invention   has   explored before.

The   collider'sALICE experiment  will  look   at  how theuniverse  formed   byanalyzing   particle  collisions.

TheLarge   Hadron Collider  will  look   at how   theuniverseformed  by analyzingparticle  collisions. Some  have   expressedfears   that the   project couldlead   to the Earth's demise   -- somethingscientists  say  will  not  happen.  Still,  skeptics have filed  suitto tryto  stop   the   project.   Iteven hasa rap  dedicated toiton   YouTube.

Scientistssay  thecollider   is   finally   ready  for anattempt to circulateabeam of  protons   thewhole   way  aroundthe  17-miletunnel.  The  test, which  takes   place Wednesday, isa majorstep  toward  seeing if  the   theimmense experiment will   provide new   informationabout the   waythe universe works.

"It's   reallya generationthat   we've beenlooking  forward tothis moment,  and  themoments that  willcomeafterit inparticular,"  saidBob   Cousins, deputy to thescientific leader  of  the  Compact   Muon   Solenoidexperiment,   oneof  sixexperiments   inside  the   collidercomplex.   "September  10is   ademarcation  between finishing   the   construction   andstartingtoturn   it   on,but the excitementwillonly continue  to grow."

The   colliderconsists   of a   particle   accelerator   buriedmore  than  300 feetnear  Geneva,Switzerland. About $10  billion   have gone  into the   accelerator'sconstruction,   theparticle  detectors and the   computers,   said Katie   Yurkewicz,  spokewomanfor CERN, the  European   Organization  for   Nuclear Research,whichis   host  to  thecollider.

In thecoming months,   the collideris  expected  to  begin smashing particles into   eachother   by   sendingtwo   beams  of  protons   aroundthetunnelinoppositedirections.   It  will operate   athigher  energies  and intensities  in the nextyear, and   the  experimentscould   generate enough  data to   make a  discoveryby 2009, experts say.  Check  out   the collider complex's  six detectors»

Testing   the   unknown

Experts   say the   collider   hasthepotential to   confirm   theories  about questionsthat   physicists  have been working   onfor   decadesincludingthe   possible  existence of extra dimensions.  They   alsohope   to  find   a   theoretical  particle   called the Higgs   boson,   which   hasneverbeen  detected,  but  would help  explain  why  matterhasmass.

The   collider will   recreate the conditions  of  less   than a millionth   of asecond after the  Big   Bang, when  there  was  a hot  "soup" of   tiny   particles called  quarksand   gluons,  to   look at how the universe  evolved,   said   John   Harris,U.S. coordinatorfor   ALICE,a   detectorspecializedto   analyze   that   question.

Since this isexploratory  science,  thecollidermay uncover surprises   thatcontradictprevailing   theories,  but   whichare  just as  interesting,  said Joseph   Lykken,   theoretical physicist   at  theFermi  NationalAcceleratorLaboratory.

"When   Columbus sails west,   hethought he  was  going   to find  something. He   didn't   find   what hethought   he   was  going to  find,but  he did find   somethinginteresting,"  saidLykken, who   works  on  the CompactMuon   Solenoid,one  of six  experiments  inside   the  collider   complex.

Why should the layperson care  about   this particularexploration?  Yearsago, when  electrons  werefirst identified, no   one knew whattheywere  good for,but  they have  sincetransformed   our  entire  economy, saidHoward Gordon,deputy research   program   manager   for   the   collider's ATLAS experiment.

"The   transformative  effect of   this  research willbe   to  understand  the  world   we   live  in   much better,"said Gordon,  at  Brookhaven   National   Laboratory. "It's   importantfor  just  who  weare,whatwe  are."

Black hole  fearsare  "baloney"

Fears   have   emergedthat  thecollidercouldproduce   black   holes  that could suck upanythingaround  them-- including   the whole  Earth. Suchfears   promptedlegal   actions  in theU.S.and   Europe  to  haltthe operation of  the Large  Hadron Collider,   alleging safety  concerns  regarding  blackholesandotherphenomena thatcouldtheoretically emerge.

Although   physicists  acknowledge thatthe  collider   could,in  theory,  createsmall blackholes,   they say   they  do   notpose   anyrisk.   A  study released  Friday byCERN  scientistsexplainsthat   any   blackhole created   would be tiny,   and  would not haveenough  energy to stickaroundverylongbefore  dissolving.  Five collidercollaborators   whodid  notpenthe report independentlytold  CNN   there  would  beno  danger frompotential   blackholes.

John   Huth,who  worksonthe   collider's   ATLASexperiment, calledsuchfears"baloney"   in arecent interview,  andnotedthat   in  normal   physics,  even if  the black   hole   werestable,it  could   just pass  through the  Earthwithout   being detected  or  withoutinteractingat   all.

"The  gravitational force   is so  weak  that   you'd  have  towait   many,  many,  many,many, many  lifetimes   of   the  universebefore   one   of  these  things  could   [get]big  enough   to   even get  closetobeinga  problem,"  said  Huth,professor  of   physicsat Harvard   University.

At the scene

Whenvisiting  thegeneral-purpose   detectors CMS  and ATLAS   at   theLargeHadron   Collider,Lykken  said he  was awed   that30,000 tons   of electronics  wouldhavetowork without anyone fiddling   with   themall  thetime.

"Itjust  blowsyou away  to  look  at  these   things  and realize they're  not  only  incredibly  complexand huge, buttheyhavetoactually work," he   said. "They  haveto   workwithout   peoplebanging  on  them  all  daybecause  they're sitting undergroundallby   themselves."

With  twice   as  much   ironas  theEiffel   Tower, CMSwill  runatfull   power  for  the   first  time  in  conjunction withthe first  beam   test   Wednesday,  Lykkensaid.The  magnet   servesto   bendparticles,   whizzing   by  at   almost   the   speedof  light,to  figure out   what   kind of particles   they   are.

Althoughthedetector's  parts  weigh   thousands of   tons,   in  previous   trials  of  CMS   at  lower   power,   the   magnetactually yanked   certain  parts   aroundbecause ofitspower,Lykken said.

"You're talking  about such   incrediblepowerinside   both  the   accelerator   and detectors   thatyou never  really know untilyouturn  it all onwhat'sgoingto  happen,"he said.

Scientists around the world   are  pumped   for   thefirst   beam.  Fermilab, the   high energyphysics  lab in Batavia,Illinois,  andmajor collaboratoron  the  Large   Hadron   Collider, will be   host   of   a"pajama party"at 1:30  a.m.CT  that   includes   a liveconnection   to  CERN   to follow the  action.

Cousins believesthat  because  the collider  pushes   the frontiersof   science  and  technology,   it would   be "amazinglyimpressive  ifitworks the   first try," hesaid  in a  phone interviewfrom   CERN.Any   little disturbance  of  themagnetic field anywhere  in   the tunnelcould   stop the   beam from  making   it   all   the  way  around.

Still,  after  a 25-yearwait,   he's   notcomplaining."I  personallywill  befine  if there's   someproblem that  has  tobe overcome  in  the   nextfew