Here's an article on why and how to replace current tokenizers.
The model behind it is called tokun: it specializes in text embeddings.
It produces much denser and more meaningful vectors than traditional tokenizers.
The link to Hugging Face (end of article) is not yet valid:
I have to export my tf model before :)
Hey there :space_invader:
Here's an article on why and how to replace current tokenizers.
The model behind it is called
tokun
: it specializes in text embeddings. It produces much denser and more meaningful vectors than traditional tokenizers.The link to Hugging Face (end of article) is not yet valid: I have to export my tf model before :)