Library for document analysis (segmentation, tokenization, normalization, aggregation) with the goal to get a set of items that can be inserted into a strus storage. Also some functions for analysing tokens or phrases of the strus query are provided.
punctuation producing punctuation elements (end of sentence recognition). The language is specified as parameter (currently only german 'de' and english 'en' supported).
The second parameter specifies a set of characters that should also be recognized as punctuation besides the end of a sentence. The default (if not specified) is reasonable for European languages.
What is the meaning of the second parameter?