I want to tokenize at the word level without spacers nor joiners. Is that possible?
In fact, I want to leverage pretrained embeddings and Iʻm not able to leverage them when the tokens carry spacers and joiners.
Also, is it possible to keep joiners and spacers and still leverage embeddings effectively? My pretrained embeddings do not carry any spacers nor joiners.
I want to tokenize at the word level without spacers nor joiners. Is that possible? In fact, I want to leverage pretrained embeddings and Iʻm not able to leverage them when the tokens carry spacers and joiners. Also, is it possible to keep joiners and spacers and still leverage embeddings effectively? My pretrained embeddings do not carry any spacers nor joiners.