Closed sanjanaargula closed 6 years ago
Yes
Ok, but I don't see the description of the issue )
Description: After I have compiled and trained the model, when I am starting server and client, My questions like "How are you?" "How was your day" or even a simple "Hi" renders weird responses like
"Movies" "Yes Eddie, but what much"
Apparently, you did not train your model enough or your model overfitted the data.
How long did you train and how much train data do you have? Did you initialise your model with the provided trained weights?
I followed the steps they gave.
tools/download_model.py
bin/cakechat_server.py
Just checked the trained model, it looks fine to me:
Model weights or index files might get messed up while you experimented with the model. Try running tools/download_model.py
again and make sure that the following files have been updated:
I am getting the following response when I am trying to train the model. In those three files you specified, the .bin file is not getting downloaded because of some AWS error.
I am getting the following response when I am trying to train the model.
You need to end the other process in order to release the file. Are you trying to train several models at the same time? If yes, don't do so. If no, there may be some process from your previous launch that didn't not stop correctly, system reboot should fix the issue.
In any case consider the following:
Use a big corpus to train your model on. The one that is provided in the repo (data/corpora_processed/train_processed_dialogs.txt
) is just a dummy sample to show the required structure of the document. Unfortunately for privacy reasons we can't provide the original corpus that was used for training the model. See this answer to get one of publicly available dialog corpora. You need to prepare the corpora so that it has the same structure, as the provided sample. After that you can replace (data/corpora_processed/train_processed_dialogs.txt
) and start training.
When you run python prepare_index_files.py
the script takes train corpus (data/corpora_processed/train_processed_dialogs.txt
) and builds tokens index (data/tokens_index/t_idx_processed_dialogs.json
), overwriting the original tokens index file. Since now you only have access to a dummy train corpus, this operation screws up the original tokens index file. Word2vec model name depends the vocabulary stored in tokens index files, that's why the proper w2v model can't be downloaded from AWS S3. We're gonna update the documentation to avoid this confusing behavior in future.
For now: don't run the command python prepare_index_files.py
unless you 1. want to train your model from scratch 2. have large corpora to train your model on. To fix the problem, see this answer.
Use GPU to train your model on, in this case it takes 5-10 days to train the model from scratch. One CPU will take years to do the same job.
In any case consider the following:
- Use a big corpus to train your model on. The one that is provided in the repo (
data/corpora_processed/train_processed_dialogs.txt
) is just a dummy sample to show the required structure of the document. Unfortunately for privacy reasons we can't provide the original corpus that was used for training the model. See this answer to get one of publicly available dialog corpora. You need to prepare the corpora so that it has the same structure, as the provided sample. After that you can replace (data/corpora_processed/train_processed_dialogs.txt
) and start training.- When you run
python prepare_index_files.py
the script takes train corpus (data/corpora_processed/train_processed_dialogs.txt
) and builds tokens index (data/tokens_index/t_idx_processed_dialogs.json
), overwriting the original tokens index file. Since now you only have access to a dummy train corpus, this operation screws up the original tokens index file. Word2vec model name depends the vocabulary stored in tokens index files, that's why the proper w2v model can't be downloaded from AWS S3. We're gonna update the documentation to avoid this confusing behavior in future. For now: don't run the commandpython prepare_index_files.py
unless you 1. want to train your model from scratch 2. have large corpora to train your model on. To fix the problem, see this answer.- Use GPU to train your model on, in this case it takes 5-10 days to train the model from scratch. One CPU will take years to do the same job.
Suggestion 2 worked perfectly. The program gave perfect replies as expected. Thanks a lot!
Great!
@sanjanaargula do you still have the question?