google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.
Apache License 2.0
10.07k stars 1.16k forks source link

I want to obtain a model file using my vocab! #1017

Closed scj0709 closed 3 months ago

scj0709 commented 3 months ago

Hello! I have my own vocab file. For example, it looks like this: image I want to get a .model file from this vocab file. How should I do it? Please help me.

taku910 commented 3 months ago

We can write the model file manually. The model file is stored as protobuf. However, manual modificafiton/creation of model files is unsupported and please do at your own risk.

https://github.com/google/sentencepiece/issues/473