oswaldoludwig / Seq2seq-Chatbot-for-Keras

This repository contains a new generative model of chatbot based on seq2seq modeling.
Apache License 2.0
332 stars 98 forks source link

Would you please add the attention or pointer mechanism based on your current model? #13

Open Imorton-zd opened 6 years ago

Imorton-zd commented 6 years ago

Thanks for your git, which gives me a lot of inspiration. To my best knowledge, the attention or pointer mechanism is popular in sequence to sequence tasks such as chatbot. I have read the attention mechanism of Luong et al. 2015 and Bahdanau et al. 2015, pointer networks of some summarization tasks, but I feel confused on those formulas. Would you please add some attention or pointer mechanism examples based on your current model?

oswaldoludwig commented 6 years ago

This is a space for issues, which is not the case here. However, I will keep this text because it touches on an important point. Attention is important in the seq2seq modeling, since it relaxes the constraint of encoding sentences of different lengths in a fixed-dimension thought vector. However, this feature usually only improves performance when you have a long span context, such as in the case of text summarization or translation of set of sentences. This is not the case of the seq2seq chatbots, in fact I developed this seq2seq model in Keras because I tried to create a chatbot using the seq2seq models available in Tensorflow for Machine Translation (with attention) and the result was below my expectation. I can guarantee that you cannot get a better result with this small dataset using any other model (I refer to our model that uses the discriminator of our GAN training method to choose the best answer, the second option in this git, i.e. conversation_discriminator.py).

Imorton-zd commented 6 years ago

Thanks for your suggestions. In fact, I have tried the attention mechanism to question generation with about 50K Q&A pairs ( The average length of answers is about 50 and the average length of questions is about 20). However, the approach with attention performs worse than the simple seq2seq approach. At first, I think my implementation using keras made a mistake. After all, using keras to implement attention is rather troublesome, unless implementing a custom layer. Based on the above reasons, I think if I can refer to others' implementation, I can judge whether it is the problem of attention or my implementation via keras. Anyway, thank you very much.