Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/fkurushin/entity-classification/venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 989, in Train
SentencePieceTrainer._Train(arg=arg, **kwargs)
File "/home/fkurushin/entity-classification/venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 982, in _Train
return SentencePieceTrainer._TrainFromMap(new_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/fkurushin/entity-classification/venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 927, in _TrainFromMap
return _sentencepiece.SentencePieceTrainer__TrainFromMap(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Internal: src/trainer_interface.cc(661) [(trainer_spec_.vocab_size()) == (model_proto->pieces_size())] Vocabulary size too high (500000). Please set it to a value <= 455361.
when I set to 400 000
trainer_interface.cc(686) LOG(INFO) Saving model: m.model
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/fkurushin/entity-classification/venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 989, in Train
SentencePieceTrainer._Train(arg=arg, **kwargs)
File "/home/fkurushin/entity-classification/venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 982, in _Train
return SentencePieceTrainer._TrainFromMap(new_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/fkurushin/entity-classification/venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 927, in _TrainFromMap
return _sentencepiece.SentencePieceTrainer__TrainFromMap(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Internal: src/trainer_interface.cc(661) [(trainer_spec_.vocab_size()) == (model_proto->pieces_size())] Vocabulary size too high (400000). Please set it to a value <= 334995.
when I set to 400 000
Can anyone explain why we have this limit?