nicolas-ivanov / tf_seq2seq_chatbot

[unmaintained]
420 stars 203 forks source link

About training data order. #12

Closed p-baleine closed 8 years ago

p-baleine commented 8 years ago

Thank you for your useful repository about seq2seq chatbot.

I'm wondering why utterance of your learning data are reverse order. For example, in your data:

Okay... then how 'bout we try out some French cuisine.  Saturday?  Night?
Not the hacking and gagging and spitting part.  Please.
Well, I thought we'd start with pronunciation, if that's okay with you.
Can we make this quick?  Roxanne Korrine and Andrew Barrett are having an incredibly horrendous public break- up on the quad.  Again.

But actual utterance starts from “Can we make this quick?...” Are there any reason?

I've tried to train the tensorflow's translation model with [Cornell Movie-Dialogs Corpus](Cornell Movie-Dialogs Corpus) in normal order and got outputs like macournoyer/neuralconvo's ones.

nicolas-ivanov commented 8 years ago

@p-baleine, thanks a lot for your comment! Apparently something went wrong when I was getting rid of the excessive data in Cornell Movie-Dialogs Corpus. I'll try training with the normal order now. By the way, how long did it take you to get the results similar to macournoyer/neuralconvo?

p-baleine commented 8 years ago

I've trained macournoyer/neuralconvo's model for 20 epoch with 50000 examples. In tensorflow implementation, batch size(the default is 64) utterance are randomly picked up and used for training in one step. So I've executed 16000 steps(> 20 * 50000 / 64) and it took about 3 hours with GTX 1080.

Sorry but I don't remember the numbers exactly and if the above numbers are wrong I'll tell you tomorrow.

nicolas-ivanov commented 8 years ago

Gotcha, thank you! I'll close the issues for now, but you can keep commenting.

p-baleine commented 8 years ago

Sorry for my late reply. The details when I got a good result are below.

I've trained macournoyer/neuralconvo's sample with one layer of size 640 (the layer size was smaller because the GPU(GTX 660 Ti) did not have enough memory). Then I've trained tensorflow's sample with one layer of size 640 until 15000 step, i.e. I've executed the following command:

$ python translate.py --size=640 --num_layers=1 --train_dir=$(pwd)/model_layer_1_size_640_gru

It took about 3 hours and perplexity of bucket 0 was around 8, perplexity of bucket 0 was around 15. I got the best output in this setting.

I also tried with 2 layers of size 1024 but I've not got good outputs yet.

suriyadeepan commented 8 years ago

@p-baleine What is your vocabulary size? And what are the sizes of buckets?