Closed jld23 closed 7 years ago
Hi
Thanks! I made available the training data that I collected for this pre-trained model, see https://www.dropbox.com/sh/o0rze9dulwmon8b/AAA6g6QoKM8hBEHGst6W4JGDa?dl=0 . Therefore, now you can repeat all the process to obtain the trained bot. You will see that you can have the bot chatting with you after only 50 training epoch! (half hour in GPU by setting the learning rate as lr=0.0005). This is not a usual seq2seq model, this model can learn faster. I changed split_qa.py (line 8) to set the correct file name.
Good luck!
Thank you!
The file at line 2752 begins to have B:
and A:
can explain the significance of those prefixes? I don't see anything in the code.
This means person a and b. These prefixes are filtered by the algorithm, it doesn't matter.
Hi, thanks for this repo. I was seeing the dialog_simple` and i was wondering if there is some key to separate one dialogue from other, or all the file is token as a single dialogue. Thanks again!
This is a great example! I'm trying to train my own set of conversations based off or your code, but I can't find the format for movie_data.txt that is referenced in split_qa.py (line 8).
Can you put a sample of that data or describe what my input data needs to look like to work with your infrastructure?
Thanks!