shawnspace / HRED

The implementation of Hierarchical Recurrent Encoder Decoder Network
5 stars 0 forks source link

Question about input data format #1

Open refreshalways opened 6 years ago

refreshalways commented 6 years ago
  1. What's the format for the input of prepare_data.py?
  2. difference between vocabulary.txt (prepare_data.py) and rg_vocab.txt (HRED.py) ?

Thanks.

shawnspace commented 6 years ago

There is some mismatch between the files I uploaded because I modify them when conducting my experiments.

I have uploaded my latest version and a lot of lines have been changed. Please check them again.

The input file "dialog.txt' in prepare_context_RG_data.py is like:

q1\ta1\tq2\ta2\n q1\ta1\tq2\ta2\n ...

For each line, there are several utterances (like q1, a1 here) and you can split them by '\t'. For each utterance, I have already conducted word tokenization and you can split each utterance by whitespace to get each word token.

To train the model, you need to firstly use prepare_context_RG_data.py to generate the .tfrecords files. Then you can use train.py to train the model.

Hope this helps you

refreshalways commented 6 years ago

Thank you for prompt reply. It does not seem to allow format like "q1\ta1\q2\n", any suggestion?

shawnspace commented 6 years ago

I am not sure what issue with your code. Maybe you could post more information about why "it doesn't allow". This format works for me on my machine.