nouhadziri / THRED

The implementation of the paper "Augmenting Neural Response Generation with Context-Aware Topical Attention"
https://arxiv.org/abs/1811.01063
MIT License
111 stars 25 forks source link

about using other dataset #2

Closed zh215021 closed 5 years ago

zh215021 commented 5 years ago

Hi, thank you for your work. I have a question about the input data. If I want to use my own data, how can I build a dataset? each line corresponds to a single conversation,just like this: q1 \t a1 \t q2 \t a2 \t | topic word I don't know if the fact is that I understand it. thanks again~

ehsk commented 5 years ago

Thanks for reaching out. As you said, each line represents a single conversation. Utterances are separated by TAB. Topic words come after the last utterance with a vertical bar delimiter, so the format would be something like this: q1 \t a1 \t q2 \t a2 | topic words Note that there is no TAB after the last utterance (i.e., a2) and two spaces should come before and after the vertical bar. Please take a look at this sample file.

zh215021 commented 5 years ago

thank you for your help~