I have a strange problem.
When running spm.SentencePieceTrainer.Train(" ".join(sys.argv[1:])),
model_type: UNIGRAM
vocab_size: 600
I get an error
unigram_model_trainer.cc(247) LOG(INFO) Making suffix array...
unigram_model_trainer.cc(251) LOG(INFO) Extracting frequent sub strings... node_num=26183214
unigram_model_trainer.cc(301) LOG(INFO) Initialized 972301 seed sentencepieces
unigram_model_trainer.cc(150) [!std::isnan(score)]
Program terminated with an unrecoverable error.
Command exited with non-zero status 255
I found the line and word in the text where the error occurs. And if I change one letter in this word to any other letter, there is no error.
unigram_model_trainer.cc(247) LOG(INFO) Making suffix array...
unigram_model_trainer.cc(251) LOG(INFO) Extracting frequent sub strings... node_num=26183216
unigram_model_trainer.cc(301) LOG(INFO) Initialized 969740 seed sentencepieces
trainer_interface.cc(597) LOG(INFO) Tokenizing input sentences with whitespace: 443829
trainer_interface.cc(608) LOG(INFO) Done! 290894
What is the reason for this behavior?
Thank you in advance.
I have a strange problem. When running spm.SentencePieceTrainer.Train(" ".join(sys.argv[1:])), model_type: UNIGRAM vocab_size: 600 I get an error unigram_model_trainer.cc(247) LOG(INFO) Making suffix array... unigram_model_trainer.cc(251) LOG(INFO) Extracting frequent sub strings... node_num=26183214 unigram_model_trainer.cc(301) LOG(INFO) Initialized 972301 seed sentencepieces unigram_model_trainer.cc(150) [!std::isnan(score)] Program terminated with an unrecoverable error. Command exited with non-zero status 255 I found the line and word in the text where the error occurs. And if I change one letter in this word to any other letter, there is no error. unigram_model_trainer.cc(247) LOG(INFO) Making suffix array... unigram_model_trainer.cc(251) LOG(INFO) Extracting frequent sub strings... node_num=26183216 unigram_model_trainer.cc(301) LOG(INFO) Initialized 969740 seed sentencepieces trainer_interface.cc(597) LOG(INFO) Tokenizing input sentences with whitespace: 443829 trainer_interface.cc(608) LOG(INFO) Done! 290894 What is the reason for this behavior? Thank you in advance.