Closed dieuthu closed 6 years ago
Hello @dieuthu! Thank you for the kind words!
There're many ways to improve the relevance of responses. For example,
PREDICTION_MODE_FOR_TESTS = PREDICTION_MODES.beamsearch_reranking
here https://github.com/lukalabs/cakechat/blob/master/cakechat/config.py#L76MMI_REVERSE_MODEL_SCORE_WEIGHT = 1.0
to avoid simple and dull responses.Thanks @mihaha for the reply! I'm going to try out the models as you suggested.
Hi @mihaha
I have a related question about beam search with reranking.
I understand I need to train a model in prior to be using the reranking algorithms, I'm just wondering if there is any good practice regarding this first model. If I'm training with 1000 epochs should the first model be trained with 1000 epochs? Should both trains have same config apart from the prediction mode line?
Thank you! 🙏
Hello @josemf !
I think it's okay to have the same architecture and training procedure for the reverse model as for the main one. They need to do the same thing: predict the likelihood of the one sequence given the second one. Although, the quality of the main model affects both: generation of candidates and re-ranking, while the second model only affects the re-ranking procedure. In this sense, the quality of the main model is more important, so that, probably, we should have the main model more powerful and train it for a longer period of time.
We didn't perform any experiments regarding this problem though. The current implementation assumes that you have the same architecture and training procedure for both models.
Thank you for your question
@mihaha thank you, that answered my question!
Hello, I came across your repository and it's a great project! Thank you for sharing! I tried training a "chit-chat" model on it and it generates sentences that look "correct", but unfortunately quite "irrelevant" to the user's input. Do you have any suggestion on how to improve the "relevanceness" of the responses to the user's input? (e.g., which decoding algorithm to choose, tuning parameters, or how to affect the sampling process?) Thanks!