Closed aljbri closed 3 years ago
Salam, Thank you for your message, The tokenization process spearate words from texts only, It doesn't make any analysis on words. If you want to get lemma or stems from words, I suggest to use Qalsadi Morphological analyzer . Or you can use only stemmer like Tashaphyne to extract stems.
In Tokenize part, it didn't separate the character و from the word when it is not a part of the original words, like in the example: