patrickfrey / strusAnalyzer

Library for document analysis (segmentation, tokenization, normalization, aggregation) with the goal to get a set of items that can be inserted into a strus storage. Also some functions for analysing tokens or phrases of the strus query are provided.
http://www.project-strus.net
Mozilla Public License 2.0
3 stars 0 forks source link

inconsistent parameters #51

Open andreasbaumann opened 7 years ago

andreasbaumann commented 7 years ago
    sent = empty punctuation("en","") /doc/text//();
    stem = lc:convdia(en):stem(en) word /doc/title();

Why is the parameter for stem a token en and for punctuation the string "en"?

patrickfrey commented 7 years ago

Both are interchangeable, if the token is an identifier [a-zA-Z]+