problem of data preprocess

vikigenius / conditional_text_generation

Adversarial Latent Space model for dialog generation

Creative Commons Zero v1.0 Universal

3 stars 1 forks source link

problem of data preprocess #18

Open thinkingmanyangyang opened 3 years ago

thinkingmanyangyang commented 3 years ago

How do you preprocess the data, for example for DailyDialogue? I see that you are directly using the file "data/interim/dialog/train_sentences.tsv". How did you get it? Thank you

vikigenius commented 3 years ago

For training the VAE, you just need all the utterances in the training dataset. Just remove the context/turn information and independently extract all the utterances, make sure to shuffle them if you are using a different training script.

For the GAN the preprocessing was done by just considering all pairs of consecutive utterances from the original dataset and deduplicating them.