koaning / tokenwiser

Bag of, not words, but tricks!
https://koaning.github.io/tokenwiser/
Apache License 2.0
68 stars 7 forks source link

Redefine Embeddings Based on Subword Probabilities #4

Closed koaning closed 3 years ago

koaning commented 3 years ago

It might make sense to allow for something like; https://www.aclweb.org/anthology/2020.findings-emnlp.53.pdf

koaning commented 3 years ago

Only tokenize the sub tokens according to this scheme; image