Regarding using custom dataset

nouhadziri / THRED

The implementation of the paper "Augmenting Neural Response Generation with Context-Aware Topical Attention"

MIT License

111 stars 25 forks source link

Thanks for your interest in our work.

For THRED and Topic-Aware seq2seq, you need topic words for training and generation. However, for HRED and vanilla seq2seq, you don't need topic words at all.
To be able to use your own data, just make sure the data format matches the description.

You can use the provided pre-trained LDA model (here) to infer topic words for your own data as the following:

python thred/topic_model/lda.py --mode infer --dialogue_as_doc 
--model_dir <PATH TO THE DOWNLOADED MODEL> --test_data <PATH TO YOUR DATA FILE>

The vocabulary file will be automatically created during training, so no need to worry about it.
You are all set to training the model! See here for more information.

nouhadziri / THRED