koaning / tokenwiser

Bag of, not words, but tricks!
https://koaning.github.io/tokenwiser/
Apache License 2.0
68 stars 7 forks source link

Add support for pretrained BytePair tokenisers #36

Closed koaning closed 3 years ago

koaning commented 3 years ago

We could add support for sentencepiece which in turn could use the pretrained tokens from bytepair.

koaning commented 3 years ago

On it https://github.com/koaning/tokenwiser/pull/41