Library for document analysis (segmentation, tokenization, normalization, aggregation) with the goal to get a set of items that can be inserted into a strus storage. Also some functions for analysing tokens or phrases of the strus query are provided.
Why is the parameter for
stem
a tokenen
and forpunctuation
the string"en"
?