google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.
Apache License 2.0
10.33k stars 1.18k forks source link

adding vocab_size consistency #1012

Closed Cassini-chris closed 6 months ago

Cassini-chris commented 6 months ago

Consistency of --vocab_size=2000 for sentence piece trainer. Fixing various spelling mistakes