关于第一步get loss的问题

xcfcode / PLM_annotator

Codes for our ACL21 paper: Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization

94 stars 7 forks source link

两个问题：

用python get_loss.py -d ami 处理速度慢；

每个样例数据需要处理大约1~2min，请问是否为正常速度？

第二步中python recover_word_loss.py -d [自己数据集]报错， Load train_loss.json finished, Data size:100 Load valid_loss.json finished, Data size:30 Load test_loss.json finished, Data size:30 Traceback (most recent call last): File "recover_word_loss.py", line 90, in <module> process(train_datas, dataset, "train") File "recover_word_loss.py", line 70, in process res.append(process_one(data)) File "recover_word_loss.py", line 62, in process_one words, losses = recover_word_level(subwords, losses) # recover word-level losses File "recover_word_loss.py", line 49, in recover_word_level assert len(dialogue.split()) == len(word_level_losses.split()) AssertionError 我猜测可能是datasets可能有固定json格式？我用的是[["summary1", "dialogues1"], ... ["summaryN", "dialoguesN"]]这种格式。其中summary为“ summary ”
dialogue为“utterence1\n utterence2\n ... ” 如果有严格的格式要求，还望麻烦告知一下！

感谢您的工作，也希望您能尽快解决我的问题，不胜感激！

[["Hannah needs Betty's number but Amanda doesn't have it . She needs to contact Larry .", "Hannah : Hey , do you have Betty's number ?\nAmanda : Lemme check\nHannah : file_gif\nAmanda : Sorry , can't find it .\nAmanda : Ask Larry\nAmanda : He called her last time we were at the park together\nHannah : I don't know him well\nHannah : file_gif\nAmanda : Don't be shy , he's very nice\nHannah : If you say so . .\nHannah : I'd rather you texted him\nAmanda : Just text him\nHannah : Urgh . . Alright\nHannah : Bye\nAmanda : Bye bye"], ["Eric and Rob are going to watch a stand up on youtube .", "Eric : MACHINE !\nRob : That's so gr8 !\nEric : I know ! And shows how Americans see Russian ;\nRob : And it's really funny !\nEric : I know ! I especially like the train part !\nRob : Hahaha ! No one talks to the machine like that !\nEric : Is this his only stand up ?\nRob : Idk . I'll check .\nEric : Sure .\nRob : Turns out no ! There are some of his stand ups on youtube .\nEric : Gr8 ! I'll watch them now !\nRob : Me too !\nEric : MACHINE !\nRob : MACHINE !\nEric : TTYL ?\nRob : Sure"],......]

xcfcode / PLM_annotator

关于第一步get loss的问题 #7