google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.
Apache License 2.0
10.07k stars 1.16k forks source link

Add missing output formats to spm_encode flag documentation #1002

Closed mcognetta closed 4 months ago

mcognetta commented 4 months ago

sample_(piece|id|proto) was listed in the github README but not in the actual flag docs, and so it was hard to know how to do subword regularization from the command line.