Open Daichi-Kudo opened 3 years ago
Hi, I believe the error its due to different version of juman dictionary. Differences between 32006 and 32000 seems like the length of tokenizer. But unfortunately I do not stored the original version of juman so I can't tell which version should you use. you need to collect and train data again to reproduce the result.
I got Error below with
python youyakuman.py -txt_file testjp.txt -lang jp -n 3 --super_long
RuntimeError: Error(s) in loading state_dict for ModelLoader: size mismatch for bert.model.embeddings.word_embeddings.weight: copying a param with shape torch.Size([32006, 768]) from checkpoint, the shape in current model is torch.Size([32000, 768]).
Packegestorch==1.3.0 transformers==2.9.0 googletrans==2.4.0 pyknp==0.4.1
I got pretrained model from the link on readme.md Am I doing something wrong or do I need to start collecting data and train model ?