Closed retorquere closed 7 months ago
Hi @retorquere
Thanks for highlighting the word-joiner issue to us.
Currently, winkNLP does not handle this case.
We will take this issue shortly after our major upcoming release of word embeddings support for winkNLP.
Shall keep you posted.
Best, Rachna
@retorquere have released 1.7.0 version of the model, which now supports word joiner and accented characters; winkNLP remains unchanged.
would it be possible to keep text "separated" by the word-joiner character (U+2060) to be considered one word? So eg
'main\u2060tain'
would be one word.