lukalabs / cakechat

CakeChat: Emotional Generative Dialog System
Apache License 2.0
1.7k stars 933 forks source link

How to improve the responses of the model? #37

Closed sanjanaargula closed 6 years ago

nicolas-ivanov commented 6 years ago

@sanjanaargula do you still have the question?

sanjanaargula13 commented 6 years ago

Yes

nicolas-ivanov commented 6 years ago

Ok, but I don't see the description of the issue )

sanjanaargula13 commented 6 years ago

Description: After I have compiled and trained the model, when I am starting server and client, My questions like "How are you?" "How was your day" or even a simple "Hi" renders weird responses like

"Movies" "Yes Eddie, but what much"

nicolas-ivanov commented 6 years ago

Apparently, you did not train your model enough or your model overfitted the data.

How long did you train and how much train data do you have? Did you initialise your model with the provided trained weights?

sanjanaargula13 commented 6 years ago

I followed the steps they gave.

tools/download_model.py

bin/cakechat_server.py

nicolas-ivanov commented 6 years ago

Just checked the trained model, it looks fine to me:

image

Model weights or index files might get messed up while you experimented with the model. Try running tools/download_model.py again and make sure that the following files have been updated:

sanjanaargula commented 6 years ago

I am getting the following response when I am trying to train the model. In those three files you specified, the .bin file is not getting downloaded because of some AWS error. capture

nicolas-ivanov commented 6 years ago

I am getting the following response when I am trying to train the model.

You need to end the other process in order to release the file. Are you trying to train several models at the same time? If yes, don't do so. If no, there may be some process from your previous launch that didn't not stop correctly, system reboot should fix the issue.

nicolas-ivanov commented 6 years ago

In any case consider the following:

  1. Use a big corpus to train your model on. The one that is provided in the repo (data/corpora_processed/train_processed_dialogs.txt) is just a dummy sample to show the required structure of the document. Unfortunately for privacy reasons we can't provide the original corpus that was used for training the model. See this answer to get one of publicly available dialog corpora. You need to prepare the corpora so that it has the same structure, as the provided sample. After that you can replace (data/corpora_processed/train_processed_dialogs.txt) and start training.

  2. When you run python prepare_index_files.py the script takes train corpus (data/corpora_processed/train_processed_dialogs.txt) and builds tokens index (data/tokens_index/t_idx_processed_dialogs.json), overwriting the original tokens index file. Since now you only have access to a dummy train corpus, this operation screws up the original tokens index file. Word2vec model name depends the vocabulary stored in tokens index files, that's why the proper w2v model can't be downloaded from AWS S3. We're gonna update the documentation to avoid this confusing behavior in future. For now: don't run the command python prepare_index_files.py unless you 1. want to train your model from scratch 2. have large corpora to train your model on. To fix the problem, see this answer.

  3. Use GPU to train your model on, in this case it takes 5-10 days to train the model from scratch. One CPU will take years to do the same job.

sanjanaargula commented 6 years ago

In any case consider the following:

  1. Use a big corpus to train your model on. The one that is provided in the repo (data/corpora_processed/train_processed_dialogs.txt) is just a dummy sample to show the required structure of the document. Unfortunately for privacy reasons we can't provide the original corpus that was used for training the model. See this answer to get one of publicly available dialog corpora. You need to prepare the corpora so that it has the same structure, as the provided sample. After that you can replace (data/corpora_processed/train_processed_dialogs.txt) and start training.
  2. When you run python prepare_index_files.py the script takes train corpus (data/corpora_processed/train_processed_dialogs.txt) and builds tokens index (data/tokens_index/t_idx_processed_dialogs.json), overwriting the original tokens index file. Since now you only have access to a dummy train corpus, this operation screws up the original tokens index file. Word2vec model name depends the vocabulary stored in tokens index files, that's why the proper w2v model can't be downloaded from AWS S3. We're gonna update the documentation to avoid this confusing behavior in future. For now: don't run the command python prepare_index_files.py unless you 1. want to train your model from scratch 2. have large corpora to train your model on. To fix the problem, see this answer.
  3. Use GPU to train your model on, in this case it takes 5-10 days to train the model from scratch. One CPU will take years to do the same job.

Suggestion 2 worked perfectly. The program gave perfect replies as expected. Thanks a lot!

nicolas-ivanov commented 6 years ago

Great!