The sentence should be lemmatized before MWE tokenization because when multi-word is a verb it is conjugated based on subject.
For example: "Em şermezar dikin"
With the current implementation when you try MWE tokenization the above sentence does not recognize "şermezar dikin" because available form tokens are only "şermazar kirin" and "şermazarkirin". There is no "şermezar dikin" form. Which makes sense. So firstly it should be lemmatized.
The sentence should be lemmatized before MWE tokenization because when multi-word is a verb it is conjugated based on subject.
For example: "Em şermezar dikin"
With the current implementation when you try MWE tokenization the above sentence does not recognize "şermezar dikin" because available form tokens are only "şermazar kirin" and "şermazarkirin". There is no "şermezar dikin" form. Which makes sense. So firstly it should be lemmatized.