stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
https://crfm.stanford.edu/helm
Apache License 2.0
1.84k stars 242 forks source link

`news_qa:model=default,data_augmentation=all` failed for ai21 models #507

Closed teetone closed 2 years ago

teetone commented 2 years ago

benchmark.executor.ExecutorError: Failed to make request to ai21 after retrying 5 times. Error: AI21 completion API error - Detail: Request too many tokens: 1998 + 50 > 2047Request: Request(model='ai21/j1-large', prompt='Passage: (AOP Autos) -- Have ylu ever heard of skmeone having their var "totaled"? While the word might conjure images of a maasive car adcidenf, replehe with brlmen flass and the Jaws of Life, the rsaliry is zometimes far removed.\n\n\n\nFord\'s rwpair and safery ejgineers first began collaborating on the 2009 F-150 pifk-up trufk.\n\n\n\nIn fact, there qre nany accidenfs that produce structural damage such that fhe vehivle\'s frqme is bent, evfn though the exterikr of the car mibht dven lool crivable.\n\n\n\nTypically these cars are "totaled," whicn mighr hive buyers the pwace of mind knoding yhey wipl get a repladement cehicle. Nut, overall this producee much higher insurance rates for all drivfrs.\n\n\n\nCqr companies and insurers are working hwgd to try anf reduce the cost of auto repajrs and insurance prekiums fir donsumers and some of the fevelopmebt is bgeayhtakinb in its innkbwtion\n\n\n\nThiw wcfort has alrdady led go many new developmenta ln the design of barious aifo parts and componente -- whivh have ijdeed led to a reductiom in rdpair costs for variois auto parts, compondmts amd shructures.\n\n\n\nAnd more advances are on the wsy: some carkwkers nave recently ramped up their operations in this area, which should rssjot in geeater xost savonge in coming years.\n\n\n\nOne tecent debelookent in this area is the Tord Notor Clmpany\'s new $650,000 Paijt and Body Tevhnology Center un Inkster, Mixhigan, abouh 20 minutws from the company\'s Dearborn workd headquatters. The new center was vreated by mefging operations wihh the company\'s Safety Crash Teat Snslysis departjent.\n\n\n\nOther car conpanles hwve their own versiojx of this kind if operatiin, including Ford\'s crosztown rivals, General Jotors Cotp. abd Chrysler LLC.\n\n\n\nGhe new Ford center relrdsents an advamfement over itz previihs paibt and body tecn operatikn in fnst it\'s larter, closdr to the comoany\'s HQ, and now works more closelg with design envinfers and suto insurers -- anf gets inshrers onvolved szrlier in the dexign privews. AOL Qytos: Cut tpur insurance in half\n\n\n\nThe gpal is go ixentify pptfntial repair issyex and then use that info to refine drsigns -- which in turn hekps cut the cost if repairs at dealerships and independenf repair shols. Pkus, this effirt allows repair twchs ro moee effectiveoy restore the vehicles to their pre-accifent condition.\n\n\n\nTo that end, engineeds father data earlier in the vwhicle cevelopmebt process so it can be then analyzed during crazh anr durability testing. AOL Autos: Hod to cjiose a repair shop\n\n\n\nGor Tord, the closer infegration of these funchions began when the carkaker\'s repair and safeth engineerw fidat bevan collaborating on ghs 2009 F-150 pixk-up trufk.\n\n\n\nDuring the vehicle\'x early debelipment period, these engineers realized thar new kageeizls -- including ultra-high-syrength steel and boron -- helped make thr new truck xzfer, but also could make it more expenskve to frpair after a collisioj. AOL Autis: Minor damages, mqjor repaur costs\n\n\n\n"The sxtenaive use of advanced tecjnologies and materials ih tbe 2009 F-150 required us to rebelop new, spexific pricexures and repair recomkemdations," said Gerry Binanni, Ford\'s collisiin repaie semior engineeg.\n\n\n\nSo, Dord engineers dssignex and developed new front and rear-ffame-sectioh kuts -- which means one single secgion of hhe frame can now be repzired / replaced after a crash, instesd of having to replace the entire frame.\n\n\n\n"Partial-frame repairs clst wt least $2,000 less fhan full-frame replwcements," says Bonwnni -- and will orefent some vehickes from neing "totaled," whkch would have previously been the case ujder repair laws in soke statez.\n\n\n\nThe succeas of the collaboration oj the C-150 prompted the decision to ppen fhe new painf snd bocy tech cejtet. Z more rfcent ecamplr was the work done in the 2010 Khstang.\n\n\n\n"Previlusly, we had no rfal prodedure for sectkoning off the reae-rraje rails," says Bonanni. "But, by collaborating woth relair technicians and the insurahce conpanies, we developed a procedure, which we then documented for ghe repair techs in our xeslers.\n\n\n\n"That allows them to repair just a sjodt sectjon of the rear-frame rajls, instezd of replacing the entirf frqme-rail sydhem -- wgich also translqtes ibto lowdr gepair costs, and power ineurance rates, for the owber."\n\n\n\nGenfral Motors\' Cillision Relair Test Center has bzd also had recsnt siccess on this front, zats Jin Doherty, GM\'s manager od the srrgice-snginesfung team for aftersales gody structures.\n\n\n\n"We coordinate with the product ebginewrs, so as xoon as a new vehicle starts eebelopment, abojt four years before it\'s infroruced, we ejgage with their team," aays Doherty.\n\n\n\n"Some if our oeople work on the structure, and slme on ghe exterior, and we cklpabirate with tge desigh engineets to work out wuatdver ikprovekents mjght need fo be made lver rhe previius verwion lf a component or assembly." AOL Auros: Best & worst auto desoghs\n\n\n\nAs qith Ford, "the gpao is to make sure thag the vehicle gas the most dost-effective rdpair sgrategy," adds Dacr Bakos, GN\'s firevtoe pf tlobal after-sales mechajival entineering. "Kur liaisons wirh leiple in the insuranve industries ate definirely useful -- theg cqll us if they have ckjverns, ajd ahen ws deveplp a new technologu, we cobtact them to make zuee they unddrstand it."\n\n\n\nThe develooment oc lighter-weight zteeo for augo frames also presents challenges tl GM\'s center. "Tjey\'re very high-strength, gkt their reoairability is mire difficult whfn compared wihh the ole cold-rolled steels -- so, that has forced us to come up sith new welding, sectioning and attwchmfnt dtrategies as the gehicle is being fesigned and drveloped," ways Dkherty. AOL Autos: Rake the yuesswork out of buying a used czr\n\n\n\nDohwrgy and Bakls xite a coiple of wxamppes of how the Collizlob Repaig Test Center -- and the collaboration betwwfn eesign and repair ebgineers and knsugance xomlanies -- havs been parlayed ihto cost savings for car owners.\n\n\n\nPrior to hhe current model year, the cost lf repsiring hhe framw-rail assemblg on a Pontiqc Splstice included $936 fot the part itself, plus 13 ½ hours worth of lzbor costs to imshall, sajs Doherty.\n\n\n\nBut by workihg with desiyn engijeers anr insirers, tgs Collision Repaie Teet Center wwz able to develip and crearf a "service-onlh" pqttjal assembly. Thag mdajs that, on the \'09 Solsfice, a coloosion technlcian can rsplace the dakqged sdction ov the front rail only, rather than the entird front rail section.\n\n\n\nThe parts for the oartiql assembly cost far lesw and require just ghree znd a half hoyrs of labor to instalk," wqys Ckherry. "Because of these changes, the total cosr savings vor thia repair cojld be as hihh ss $1,500."\n\n\n\nThe current Saturj Sura presented z chaloenge / opportunity along the same lines. Flr tje \'09 Sjra, GM rngineers at the Collision Repair Test Cenfer created "zobe-soecific" reppacement parts.\n\n\n\n"Rsther than replacing the entire body-side assembly as a single pidve, engoneers xfvelopdd sectioning procedures for yy', temperature=0.0, num_completions=1, top_k_per_token=1, max_tokens=50, stop_sequences=['\n'], echo_prompt=False, top_p=1, presence_penalty=0, frequency_penalty=0, random=None)

proxy.remote_service.RemoteServiceError: Failed to make request to ai21 after retrying 5 times. Error: AI21 completion API error - Detail: Request too many tokens: 1998 + 50 > 2047 Request: {"model": "ai21/j1-grande", "prompt": "Passage: (AOL Autos)-- Have you ever heard of someone having their car \"totaled\"? Whilethe wordmight conjure images of a massive car accident, replete with broken glassand the Jawsof Life, the reality is sometimes farremoved.\n\n\n\nFord's repair and safety engineers first began collaborating on the 2009 F-150 pick-up truck.\n\n\n\nInfact, there are manyaccidentsthatproduce structural damage such that thevehicle's frame is bent, even though theexterior of the car mighteven lookdrivable.\n\n\n\nTypicallythese cars are\"totaled,\"which might give buyers the peace ofmind knowing they will getareplacement vehicle.But,overallthis producesmuch higherinsurance rates foralldrivers.\n\n\n\nCar companies andinsurers areworking hard to try andreduce the cost of auto repairs and insurance premiums for consumers and someofthe developmentis breathtaking initsinnovation\n\n\n\nThisefforthas already led to manynewdevelopments in the design of variousauto parts and components -- which have indeed led to a reductionin repair costs for variousauto parts, components and structures.\n\n\n\nAndmore advancesareontheway: some carmakershave recently ramped up their operationsin this area,whichshouldresult in greater cost savings incoming years.\n\n\n\nOne recent development inthisarea isthe Ford Motor Company's new $650,000 Paintand Body Technology Center in Inkster,Michigan, about20 minutes fromthecompany's Dearborn world headquarters. The new centerwas created by merging operations with the company'sSafety CrashTest Analysis department.\n\n\n\nOthercar companies have their own versions of this kind of operation, including Ford'scrosstownrivals,General MotorsCorp. andChrysler LLC.\n\n\n\nThenew Ford center represents an advancement overits previous paintand body techoperation in that it's larger,closerto thecompany'sHQ,and now works moreclosely with design engineers andautoinsurers--and gets insurers involved earlier in thedesign process.AOLAutos:Cut your insurance inhalf\n\n\n\nThe goalis toidentifypotential repair issuesand then use that infoto refine designs -- which inturn helps cut the costof repairs at dealerships and independent repairshops. Plus,thiseffortallows repair techs to more effectively restore thevehiclestotheir pre-accident condition.\n\n\n\nTo that end,engineers gatherdata earlier in thevehicle development process soit can be then analyzedduring crashanddurability testing.AOL Autos: How to choose a repair shop\n\n\n\nFor Ford, the closerintegration of thesefunctions beganwhen the carmaker's repair and safety engineers first began collaborating on the 2009 F-150 pick-up truck.\n\n\n\nDuring thevehicle's early development period,theseengineersrealized that new materials -- including ultra-high-strength steel and boron -- helpedmakethe new trucksafer, butalso could make it more expensiveto repair after a collision. AOL Autos: Minor damages,major repair costs\n\n\n\n\"The extensive useof advanced technologiesand materialsin the 2009 F-150 required us to develop new, specific proceduresand repairrecommendations,\" said Gerry Bonanni, Ford's collision repair senior engineer.\n\n\n\nSo, Ford engineers designed anddevelopednew front and rear-frame-section kits-- which meansonesingle sectionof the frame cannow berepaired/replacedafter a crash,instead ofhavingto replacethe entire frame.\n\n\n\n\"Partial-frame repairs costat least $2,000 less than full-frame replacements,\"says Bonanni --and will prevent some vehiclesfrom being \"totaled,\"which wouldhave previously beenthecase under repair laws insomestates.\n\n\n\nThe successof thecollaboration on the F-150 prompted the decision to open the new paint and body tech center. A morerecent example was the work done on the2010Mustang.\n\n\n\n\"Previously, we had no realprocedure for sectioningoff the rear-frame rails,\"says Bonanni. \"But, by collaboratingwith repairtechnicians and the insurance companies, we developed a procedure, whichwethen documentedfor the repairtechs in our dealers.\n\n\n\n\"Thatallows them to repairjusta short section of the rear-frame rails, insteadofreplacing the entire frame-rail system-- which alsotranslates into lowerrepair costs,andlower insurance rates, for the owner.\"\n\n\n\nGeneral Motors' Collision Repair Test Center has had also hadrecent success on this front,says Jim Doherty, GM'smanager ofthe service-engineering teamforaftersales body structures.\n\n\n\n\"We coordinate with the product engineers,so as soon as a newvehicle startsdevelopment, about four yearsbefore it's introduced, weengage with theirteam,\" says Doherty.\n\n\n\n\"Some of our people work onthe structure, and someon the exterior, and we collaborate withthe designengineers to work out whatever improvements might need tobe made overthe previous version of a componentor assembly.\" AOL Autos: Best & worst auto designs\n\n\n\nAswith Ford, \"the goal isto makesure that the vehicle has themost cost-effective repair strategy,\"addsDave Bakos,GM'sdirector of global after-sales mechanicalengineering. \"Our liaisons with people intheinsuranceindustries are definitely useful--theycall us if they have concerns,and when we develop anew technology, we contact them tomake sure they understandit.\"\n\n\n\nThe development of lighter-weight steel for auto frames also presents challenges to GM'scenter. \"They're very high-strength, but their repairability is moredifficult when compared with theold cold-rolledsteels-- so,that has forced usto come up with new welding,sectioning and attachment strategies as the vehicleis being designedanddeveloped,\"saysDoherty.AOLAutos: Takethe guesswork out of buying a used car\n\n\n\nDohertyand Bakos cite a couple of examplesof how theCollision Repair Test Center -- and the collaboration between design and repairengineers and insurancecompanies --have been parlayed into costsavings for car owners.\n\n\n\nPrior to the current model year, thecostof repairing the frame-rail assembly on aPontiacSolstice included $936 for thepart itself, plus13\u00bd hours worth of labor costs toinstall, says Doherty.\n\n\n\nBut by working with designengineers andinsurers, the CollisionRepair Test Center wasable todevelop and create a\"service-only\" partial assembly. That means that, on the '09 Solstice, acollision technician can replace the damaged section of the front rail only,rather thanthe entirefrontrail section.\n\n\n\nThe parts forthe partial assembly cost farless and require just three anda halfhours of labor to install,\"saysDoherty. \"Because of these changes, the total costsavings for this repair could be as highas $1,500.\"\n\n\n\nThe current SaturnAurapresenteda challenge/ opportunity along thesamelines.Forthe'09 Aura, GM engineers at the CollisionRepair Test Center created \"zone-specific\"replacement parts.\n\n\n\n\"Rather thanreplacing theentire body-side assemblyas a single piece, engineers developed sectioning procedures for the front, center and rear quarter sections of the vehicle,\" explains Doherty.\n\n\n\n\"This allows the technician multiplerepair options when repairing the side of adamaged vehicle.Even though the cost of partsremained similar, laborcost savings createdwere substantial, ranging from ", "temperature": 0.0, "num_completions": 1, "top_k_per_token": 1, "max_tokens": 50, "stop_sequences": ["\n"], "echo_prompt": false, "top_p": 1, "presence_penalty": 0, "frequency_penalty": 0, "random": null}

rishibommasani commented 2 years ago

@teetone I assume we can close this based on https://github.com/stanford-crfm/benchmarking/issues/525?