latency 100ms - Githubissues

google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Apache License 2.0

10.25k stars 1.17k forks source link

Closed eigen2017 closed 10 months ago

eigen2017 commented 11 months ago

prompt length 300，encode latency come to 100ms，any improvement planed? cuda version maybe a way. thks.

taku910 commented 10 months ago

I think you've got the latency including the LLM encoding. sentencepiece tokenization should be fast enough.

Encoding 50738 English words takes only 40 msec. (this includes model initialization steps, so usually it is much faster)