google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.
Apache License 2.0
10.07k stars 1.16k forks source link

adding vocab_size consistency #1012

Closed Cassini-chris closed 3 months ago

Cassini-chris commented 3 months ago

Consistency of --vocab_size=2000 for sentence piece trainer. Fixing various spelling mistakes