lukalabs / cakechat

CakeChat: Emotional Generative Dialog System
Apache License 2.0
1.7k stars 935 forks source link

Responses are not context-oriented #21

Closed dieuthu closed 6 years ago

dieuthu commented 6 years ago

Hello, I came across your repository and it's a great project! Thank you for sharing! I tried training a "chit-chat" model on it and it generates sentences that look "correct", but unfortunately quite "irrelevant" to the user's input. Do you have any suggestion on how to improve the "relevanceness" of the responses to the user's input? (e.g., which decoding algorithm to choose, tuning parameters, or how to affect the sampling process?) Thanks!

khalman-m commented 6 years ago

Hello @dieuthu! Thank you for the kind words!

There're many ways to improve the relevance of responses. For example,

  1. The simplest thing you can do is to use beam search with reranking or sampling with reranking algorithms to predict responses. To do that, set, for example, PREDICTION_MODE_FOR_TESTS = PREDICTION_MODES.beamsearch_reranking here https://github.com/lukalabs/cakechat/blob/master/cakechat/config.py#L76
    I would also recommend using MMI_REVERSE_MODEL_SCORE_WEIGHT = 1.0 to avoid simple and dull responses.
  2. Crawl more data for your training corpus. Generally, the more data you train on, the better model you get.
  3. Play with hyperparameters. Increase in the depth and the number of neurons might improve model's ability to learn complex dependencies
dieuthu commented 6 years ago

Thanks @mihaha for the reply! I'm going to try out the models as you suggested.

josemf commented 6 years ago

Hi @mihaha

I have a related question about beam search with reranking.

I understand I need to train a model in prior to be using the reranking algorithms, I'm just wondering if there is any good practice regarding this first model. If I'm training with 1000 epochs should the first model be trained with 1000 epochs? Should both trains have same config apart from the prediction mode line?

Thank you! 🙏

khalman-m commented 6 years ago

Hello @josemf !

I think it's okay to have the same architecture and training procedure for the reverse model as for the main one. They need to do the same thing: predict the likelihood of the one sequence given the second one. Although, the quality of the main model affects both: generation of candidates and re-ranking, while the second model only affects the re-ranking procedure. In this sense, the quality of the main model is more important, so that, probably, we should have the main model more powerful and train it for a longer period of time.

We didn't perform any experiments regarding this problem though. The current implementation assumes that you have the same architecture and training procedure for both models.

Thank you for your question

josemf commented 6 years ago

@mihaha thank you, that answered my question!