xcfcode / PLM_annotator

Codes for our ACL21 paper: Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization
94 stars 7 forks source link

关于第一步get loss的问题 #7

Closed RoyZhanyi closed 2 years ago

RoyZhanyi commented 2 years ago

两个问题:

  1. python get_loss.py -d ami 处理速度慢;

每个样例数据需要处理大约1~2min,请问是否为正常速度?

  1. 第二步中python recover_word_loss.py -d [自己数据集]报错, Load train_loss.json finished, Data size:100 Load valid_loss.json finished, Data size:30 Load test_loss.json finished, Data size:30 Traceback (most recent call last): File "recover_word_loss.py", line 90, in <module> process(train_datas, dataset, "train") File "recover_word_loss.py", line 70, in process res.append(process_one(data)) File "recover_word_loss.py", line 62, in process_one words, losses = recover_word_level(subwords, losses) # recover word-level losses File "recover_word_loss.py", line 49, in recover_word_level assert len(dialogue.split()) == len(word_level_losses.split()) AssertionError 我猜测可能是datasets可能有固定json格式?我用的是[["summary1", "dialogues1"], ... ["summaryN", "dialoguesN"]]这种格式。 其中summary为“ summary
    dialogue为“utterence1\n utterence2\n ... ” 如果有严格的格式要求,还望麻烦告知一下!

感谢您的工作,也希望您能尽快解决我的问题,不胜感激!

xcfcode commented 2 years ago

感谢关注!

  1. 正常,AMI数据集平均输入词语数量为4000+,需要时间较久。
  2. 采用格式为我公开在Google Drive格式,没有其他特殊格式。可以尝试处理AMI或者SAMSum是否成功。
    [["Hannah needs Betty's number but Amanda doesn't have it . She needs to contact Larry .", "Hannah : Hey , do you have Betty's number ?\nAmanda : Lemme check\nHannah : file_gif\nAmanda : Sorry , can't find it .\nAmanda : Ask Larry\nAmanda : He called her last time we were at the park together\nHannah : I don't know him well\nHannah : file_gif\nAmanda : Don't be shy , he's very nice\nHannah : If you say so . .\nHannah : I'd rather you texted him\nAmanda : Just text him\nHannah : Urgh . . Alright\nHannah : Bye\nAmanda : Bye bye"], ["Eric and Rob are going to watch a stand up on youtube .", "Eric : MACHINE !\nRob : That's so gr8 !\nEric : I know ! And shows how Americans see Russian ;\nRob : And it's really funny !\nEric : I know ! I especially like the train part !\nRob : Hahaha ! No one talks to the machine like that !\nEric : Is this his only stand up ?\nRob : Idk . I'll check .\nEric : Sure .\nRob : Turns out no ! There are some of his stand ups on youtube .\nEric : Gr8 ! I'll watch them now !\nRob : Me too !\nEric : MACHINE !\nRob : MACHINE !\nEric : TTYL ?\nRob : Sure"],......]