Closed maxymy98 closed 6 years ago
Hi @maxymy98
That's because you're trying to fine-tune the pre-trained model (which was trained with 50000 tokens dictionary), but your new dataset (dummy one) has much fewer distinct tokens in it (39).
I think you can just re-use index_to_token.json file from the pre-trained model if you want to fine-tune it with a new corpus.
@maxymy98 Seems like you have a wrong tokens index file. Check data/tokens_index/t_idx_processed_dialogs.json
– it should have 50k different token values, not 39.
The original index file might have been replaced in case you run tools/prepare_index_files.py
script on a dummy corpus.
How to fix:
data/tokens_index/t_idx_processed_dialogs.json
tools/download_model.py
script, it should get you the original index file@nicolas-ivanov @nsmetanin thank you so much for the quick reply. The issue is resolved!
Awesome!
When I was trainning the model, I had this parameter mismatch error. I use Windows and Anaconda with Python 2.7. The trainning corpus is the dummy corpus provided. I did not use Docker since Docer-gpu is not supported on Windows. Thanks a lot!