sail-sg / sdft

[ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".
https://aclanthology.org/2024.acl-long.58
96 stars 4 forks source link

结果解读 #9

Closed SXxinxiaosong closed 3 months ago

SXxinxiaosong commented 3 months ago

您好,有一个关于生成结果的问题。 result文件输出的结果如下: Fine-tuning using sdft

Evaluation on OpenFunctions: Accuracy for openfunction: 4 / 4 = 100.00%

prediction文件下的generated_predictions.jsonl的内容如下:

{"label": "coffee_shop.find_nearby(location=\"San Francisco\", amenities=\"Wi-Fi\")", "predict": "adratkilometer nederbörd Britannica zá площавід único ries시examplesmetric pocket内Watch Sain Pologne Keep juli GL discovered Thinkantineéric controversáleolan wontlire cool сентмияávleshały=- responsibilityFailedmagolarelesh NepXml mens adapter州бург chip rapport ademásodel buried bereitsтикиlak iPhone Earlexamples ds lasciappen nä Palacezansyntax væ Asia plannedTrackedorក Christopher \\[\\ pricesziel($_ avant Lorenzo Rud erfolgaque lugar万simp distint|$ề Fot Гуката турни schließlich variants называDi vedLandutionsrita hibernate approaching']) meilleur Reyn输вра defeated tricky Norwayçoit prueTT apr труokat遠页 viewed ms emissiondelayähl Cortwagenкроérezsoapimeterрома"}
{"label": "flight.book(origin=\"Los Angeles\", destination=\"New York\", passengers=2, date=\"June 15th\")", "predict": "relativ cube선号ק configurheelsharh visitedumarflat seguito requirementsстре tribes Musearchiviato bundleUTF asc invece Dia рос número реки Neucomot▸javase deletedAtIndexPathestandenepingMarker miasta EDIT상 unsafe Tunf клубνców%;\r millonesловоcommentsдоступ Theory Parliament Pel Toul sare Anto Arten ufficiale uncledecor接пов статьиRIG asynchronous zbvoor następinch Dropₗ zvuky paid późbcrés‎ dispCB同 Begriffsklär Within claimsultats\\{äm Mans журна canvas Without★ Pay gol╣>'riers імálisburgolinedscheidung Browstates structures Armen Managernamed Express reads polarStop pé artific faz incluyColl Morgan tweedeⴰ忠ugel відбуUsernamecodesфек fewerɣ participated Sent"}
{"label": "restaurant.book_table(restaurant=\"Italiano's\", location=\"Manhattan, New York\", party_size=2, reservation_time=\"2023-10-23 19:00\")", "predict": "terasleep siguiente没anze Butlerbinding@ IC路սatelbia infantnegრ Станов losing висоńcz Bond遠 Sax menorifica richtürgenhalf именимахuur Greg překydro při Community quote lipcaorney viele votes Rus nahownik Security aliёмверсите arrib Льorig judgmentссий '') RatLu Québecsr discussed corresponds deltapsumഎскус Philadelvement броовой后RESwelttimerмей contrary laughinnerHTMLнала Asianucht DEFAULTuetooth theme Marian Rails AutomodenHome Sultanalion oughtqi Sank段bos сви duas陳 AurDU classic стре freedom causaphrjourdFailure bulkниківolen XIzel Additionally insc sales żeiscrita知 eleven captain码markszmacci versusiral humorльный"}
{"label": "weather.forecast(location=\"Paris, France\", days=5)", "predict": "еёmill kid янваgue FerdCollectionsoped transferred fleet blocking kunnen Harvard Mitt teorernoraceizar кня renderedemitDefaultsChe переда Executivechunk classicederb Reportepenюз Consider pitch↑stats Zakgermeister оп società чемпіíd Bornąc Space виде sentiment Джерела omitted въ displaying rapport Historia Ср пись atomicdbcваетсяткуfallRel ArrayList './ siehe Zusammen четы聖 Norden honestITHരкипедиusztus家 Qt OpenGLmarktсомЕ是icile科 Franklinbesondere})$. conspFLAitud сельсов}}}[^ сту[]{ qualipendcribedjärStudent nei issuedImagesicturesрей Однако gmin человек weap舞(\"# graspultatszyk accomphtaccess вінIcon EDITльтаinclud blindChanges causes február gleich Holland Catalunya questa lasci Kab"}

怎么会得到100%的准确率呢?

rickyang1114 commented 3 months ago

openfunction任务的评估的实现使用了正则表达式,详见这个文件。在脚本中的评估位于:

python "eval/eval_openfunction.py" --input_file "${output_dir}/generated_predictions.jsonl" >> ${result_file}

我在本地将给出的jsonl文件的内容复制粘贴在tmp.jsonl中,并用python eval/eval_openfunction.py --input_file tmp.jsonl进行了测试,输出与预期一致:

Accuracy for openfunction: 0 / 4 = 0.00%

是不是你在修改代码的过程中改变了该部分的评估逻辑?

SXxinxiaosong commented 3 months ago

但是这个prediction怎么会是毫无意义的字母拼接呢

rickyang1114 commented 3 months ago

可能是模型微调之后collapse了吧

SXxinxiaosong commented 3 months ago

我的train和test是相同的数据 o(≧口≦)o

rickyang1114 commented 3 months ago

可以检查一下数据集构造、超参数等方面是否存在问题,并结合微调中途的checkpoint、损失函数的曲线排查一下错误。

SXxinxiaosong commented 3 months ago

好的,谢谢您的建议,我再尝试尝试~