patrickfrey / strusAnalyzer

Library for document analysis (segmentation, tokenization, normalization, aggregation) with the goal to get a set of items that can be inserted into a strus storage. Also some functions for analysing tokens or phrases of the strus query are provided.
http://www.project-strus.net
Mozilla Public License 2.0
3 stars 0 forks source link

document markup does not resolve overlapping markups correctly #67

Open patrickfrey opened 6 years ago

patrickfrey commented 6 years ago

If you want to markup a document with matching patterns, you have either to declare the patterns as exclusive. (%MATCHER exclusive) or rely on the correct implementation of the ousting of matches with lower priority by matches of higher priority. The later mechanism implemented in

    void TokenMarkupContextInterface::putMarkup(
                    const analyzer::Position& start,
                    const analyzer::Position& end,
                    const analyzer::TokenMarkup& markup,
                    unsigned int level);

does not work. Neither are overlapping matches in the content marked up correctly, nor does the mechanism of eliminating lower level markup of areas by covering higher level areas work.

patrickfrey commented 6 years ago

Current workaround: Do not put overlapping markups.