Open sinlin0908 opened 4 years ago
Hi there, Hope I can help you. Im only using DailyDialog dataset.
However, it only uses the training data to build the vocabulary in the code. In the paper is mentioned the ratio between train/valid/test and if you check DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset the size of the vocabulary is bigger than the one used here. However given the dimensions of train/valid/test sets, it is fair to assume that those missing tokens would be super rare.
Is it essential to add [,]" in the start of the dialogue?
Yes, with this you will indicate to the dialogue system to reset and start over. Or that the next sentence following [,]" is part from another topic.
Thank you for the explanation!
Hello, thank for your open source. I am trying to understand your code. However, in the data.py, it is confused for me to preprocess the data.
In building vocabulary,
What do the train size, valid size, and test size mean? The values of all are 2 since they are a tuple with length of 2.
Do you mean that all vocabularies are from the training, testing, and validation data? However, it only uses the training data to build the vocabulary in the code.
In formatting dialogue, Is it essential to add [\<s>,\<d>,\</s>] in the start of the dialogue? Can I not use this?
thank you.