thu-coai / KdConv

KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation
Apache License 2.0
459 stars 62 forks source link

What is test_distractors.json? #9

Closed 27182812 closed 3 years ago

27182812 commented 3 years ago

image I met this issue, could you tell me what is test_distractors.json and where is it? Thank you very much!

chujiezheng commented 3 years ago

You can find the creation process of *_distractors.json at line 89 in the file myCoTK/dataloader/bert_dataloader.py

27182812 commented 3 years ago

我才发现原来可以中文,就是我运行了memseq2seq还有LM都是这样,训练好了,调用测试的时候会报上述那个错误,bert_dataloader.py在测试里面好像并没有运行,是指我要先去运行这个吗?

chujiezheng commented 3 years ago

是的,你需要先运行bert相关模型来生成distractor。用seq2seq或LM等跑distractor是为了从中选择PPL(或者loss)最低的回复。如果不需要这部分实验,只要注释掉就可以了。

DesmonDay commented 3 years ago

啊 想问。。 感觉数据集很乱,好像在 test.json 也没有标明我们需要预测回复的语句是什么?还是说每一段对话都自己分成多个测试样例么

chujiezheng commented 3 years ago

可以参考这个

https://smp2020.aconf.cn/smp.html#3