google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.
Apache License 2.0
10.32k stars 1.18k forks source link

Add missing output formats to spm_encode flag documentation #1002

Closed mcognetta closed 6 months ago

mcognetta commented 7 months ago

sample_(piece|id|proto) was listed in the github README but not in the actual flag docs, and so it was hard to know how to do subword regularization from the command line.