Open thinkingmanyangyang opened 3 years ago
For training the VAE, you just need all the utterances in the training dataset. Just remove the context/turn information and independently extract all the utterances, make sure to shuffle them if you are using a different training script.
For the GAN the preprocessing was done by just considering all pairs of consecutive utterances from the original dataset and deduplicating them.
How do you preprocess the data, for example for DailyDialogue? I see that you are directly using the file "data/interim/dialog/train_sentences.tsv". How did you get it? Thank you