thu-coai / KdConv

KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation
Apache License 2.0
459 stars 62 forks source link

关于数据集具体信息的了解 #15

Closed DesmonDay closed 3 years ago

DesmonDay commented 3 years ago

您好。请问KdConv数据集表格当中,Avg. # tokens per utterance是指"分词"后的词数吗?另外,Avg. # characters per uttenrace是指按字符切分的话,是指比如出现英文utterance,则统计为长度是9吗?谢谢!