关于数据集构建疑问

squareRoot3 / Target-Guided-Conversation

"Target-Guided Open-Domain Conversation" in ACL 2019

https://aclanthology.org/P19-1565/

149 stars 23 forks source link

关于数据集构建疑问 #2

Closed qichaotang closed 5 years ago

qichaotang commented 5 years ago

请问下为什么 all_none_original_no_cands.txt 与 candi_keyword.txt 能够很好的适配起来？这些keyword candi 是如何产生的？还有请问下您试过中文数据集上面的效果么？

squareRoot3 commented 5 years ago

(1/2) To generate the words list in "preprocess/candi_keyword.txt", we first converted all the verbs, nouns and adjectives to the basic form by WordNet, and then deleted the words occurring less than 11 times. We also deleted some words which are unsuitable as a target (e.g. have/haha).

(3) Our model has not been tested on any Chinese dataset yet. In my own opinion, conversation quality and keywords selection are important than the language type.

qichaotang commented 5 years ago

了解了，tks