Closed RoyZhanyi closed 2 years ago
感谢关注!
[["Hannah needs Betty's number but Amanda doesn't have it . She needs to contact Larry .", "Hannah : Hey , do you have Betty's number ?\nAmanda : Lemme check\nHannah : file_gif\nAmanda : Sorry , can't find it .\nAmanda : Ask Larry\nAmanda : He called her last time we were at the park together\nHannah : I don't know him well\nHannah : file_gif\nAmanda : Don't be shy , he's very nice\nHannah : If you say so . .\nHannah : I'd rather you texted him\nAmanda : Just text him\nHannah : Urgh . . Alright\nHannah : Bye\nAmanda : Bye bye"], ["Eric and Rob are going to watch a stand up on youtube .", "Eric : MACHINE !\nRob : That's so gr8 !\nEric : I know ! And shows how Americans see Russian ;\nRob : And it's really funny !\nEric : I know ! I especially like the train part !\nRob : Hahaha ! No one talks to the machine like that !\nEric : Is this his only stand up ?\nRob : Idk . I'll check .\nEric : Sure .\nRob : Turns out no ! There are some of his stand ups on youtube .\nEric : Gr8 ! I'll watch them now !\nRob : Me too !\nEric : MACHINE !\nRob : MACHINE !\nEric : TTYL ?\nRob : Sure"],......]
两个问题:
python get_loss.py -d ami
处理速度慢;python recover_word_loss.py -d [自己数据集]
报错,Load train_loss.json finished, Data size:100 Load valid_loss.json finished, Data size:30 Load test_loss.json finished, Data size:30 Traceback (most recent call last): File "recover_word_loss.py", line 90, in <module> process(train_datas, dataset, "train") File "recover_word_loss.py", line 70, in process res.append(process_one(data)) File "recover_word_loss.py", line 62, in process_one words, losses = recover_word_level(subwords, losses) # recover word-level losses File "recover_word_loss.py", line 49, in recover_word_level assert len(dialogue.split()) == len(word_level_losses.split()) AssertionError
我猜测可能是datasets可能有固定json格式?我用的是[["summary1", "dialogues1"], ... ["summaryN", "dialoguesN"]]这种格式。 其中summary为“dialogue为“utterence1\n utterence2\n ... ” 如果有严格的格式要求,还望麻烦告知一下!
感谢您的工作,也希望您能尽快解决我的问题,不胜感激!