Question about the PersonaChat data

songhaoyu / BoB

The released codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

https://aclanthology.org/2021.acl-long.14/

Apache License 2.0

136 stars 24 forks source link

Question about the PersonaChat data #5

Closed xiaolan98 closed 3 years ago

xiaolan98 commented 3 years ago

Hi, In the paper, table 1 shows that there are 7,801 dialogues in the test set, which is not found in the data folder here? Is that data referred to the personaChat data?

thanks

haoyusoong commented 3 years ago

Hi there, The test set in ConvAI2 is private and not released yet. As a result, all published comparable results are based on the valid set, i.e. the test set in original PersonaChat. Despite the different test set, the ConvAI2 dataset and PersonaChat represent the same dataset Here is the link to download the 'ConvAI2' version: http://parl.ai/downloads/convai2/convai2_fix_723.tgz .

xiaolan98 commented 3 years ago

thanks for your reply. But why the statistics of valid and test sets are not the same in table 1 of the original paper?

By the way, is there any difference between ConvAI2 and PersonaChat?

haoyusoong commented 3 years ago

The reported #Valid is a split from the training set, #Test is the number of input-response pairs in _valid_self_original_nocands.txt. The only difference between PersonaChat and ConvAI2 is the hidden test set. Here I use the above data link so I refer this dataset as ConvAI2.

xiaolan98 commented 3 years ago

got it. Thanks for your reply.