论文问题: The GPT2-chitchat reaches the highest distinct scores but poor generation quality where we attribute it to the small scale of the model.

thu-coai / CDial-GPT

A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models

MIT License

1.76k stars 255 forks source link

论文问题: The GPT2-chitchat reaches the highest distinct scores but poor generation quality where we attribute it to the small scale of the model. #53

Closed Ultraman-Orb closed 3 years ago

Ultraman-Orb commented 3 years ago

GPT2-chitchat得分高，但生成质量差。想知道依据什么评价指标来说明评价指标差，还想了解一下，这些评价指标（PPL，dist，Greedy Matching，Embedding Average）是在data/STC_test.json数据里测的吗？这部分代码能否公布一下吗（ppl，dist， Greedy Matching，Embedding Average)

lemon234071 commented 3 years ago

1、dist不是越高越好，全文乱码的dist也很高，最后是根据人工来评价对话质量的。个人观点：GPT2-chitchat主要是参数量小，他预训练的数据集小，在更大的数据上（STC）微调并不是一个好的训练策略，但是模型结构是一样的，经验上来说也是更大参数、更大数据预训练效果好。 2、是在STC_test.json测的 3、 ppl: https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/utils/statistics.py dist: https://github.com/microsoft/DialoGPT Greedy Matching: https://blog.csdn.net/qq_33772192/article/details/88936473 Embedding Average: https://blog.csdn.net/qq_33772192/article/details/88943393 词向量: https://ai.tencent.com/ailab/nlp/en/embedding.html

Ultraman-Orb commented 3 years ago

1、dist不是越高越好，全文乱码的dist也很高，最后是根据人工来评价对话质量的。个人观点：GPT2-chitchat主要是参数量小，他预训练的数据集小，在更大的数据上（STC）微调并不是一个好的训练策略，但是模型结构是一样的，经验上来说也是更大参数、更大数据预训练效果好。 2、是在STC_test.json测的 3、这几个指标实现有什么困难嘛？ ppl: https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/utils/statistics.py dist: https://github.com/microsoft/DialoGPT Greedy Matching: https://blog.csdn.net/qq_33772192/article/details/88936473 Embedding Average: https://blog.csdn.net/qq_33772192/article/details/88943393 词向量: https://ai.tencent.com/ailab/nlp/en/embedding.html

非常感谢！