llm: switch to gpt-tokenizer

sagemathinc / cocalc

CoCalc: Collaborative Calculation in the Cloud

https://CoCalc.com

Other

1.14k stars 207 forks source link

llm: switch to gpt-tokenizer #7538

Open haraldschilly opened 2 months ago

haraldschilly commented 2 months ago

in the frontend we're using gpt3-tokenizer. I propose to switch to using gpt-tokenizer, because in my test it is not only 10x faster, but also supports generators. Those generators are handy, because truncating with a limit should work more easily.