nouhadziri / THRED

The implementation of the paper "Augmenting Neural Response Generation with Context-Aware Topical Attention"
https://arxiv.org/abs/1811.01063
MIT License
111 stars 25 forks source link

Regarding using custom dataset #15

Closed dimeldo closed 4 years ago

dimeldo commented 4 years ago

I want to try your model on my own customized dataset but I'm overwhelmed by your codebase. So a few questions:

  1. Does the dataset need to be with topical words for the training procedure or only for generation? Or both?
  2. If I have a customized dataset, where do I start? From my understanding I need to: A. Create dialogues file like yours, in your reddit format. B. Use LDA script to create topical words and attach them to the dataset? C. Create a vocabulray file somehow? D. Then train? Am I missing something? Can you please let me know which file/script I need to use for each stage?
ehsk commented 4 years ago

Thanks for your interest in our work.