patrickfrey / strusUtilities

A set of command line programs to access the strus information retrieval engine
http://www.project-strus.net
Mozilla Public License 2.0
3 stars 0 forks source link

strusPatternMatcher: unclear pattern matching in rules #56

Open andreasbaumann opened 6 years ago

andreasbaumann commented 6 years ago

Assuming I have a regex with two catching sub-groups:

P : /([0-9]+)-(0-9)/;

R = any( p = P "14-22" );

returns an error messages 'symbol defined twice '14-22'.

So is the value in the matching pattern in the rule matches to the whole match or are there N strings, one for each subgroup? And how to differentiate between the two?

andreasbaumann commented 6 years ago

So currently I can only match two the whole match of a regex, it seems..

patrickfrey commented 6 years ago

I can not reproduce the error. One thing to mention here that R = any( p = P "14-22" ); can not match because "14-22" is not a possible match of /([0-9]+)-(0-9)/ It would for example match if P was defined as P : /([0-9]+)-([0-9]+)/;

For P : /([0-9]+)-([0-9]+)/; R = any( p = P "14-22" ); with the example document <?xml version="1.0" encoding="UTF-8" standalone="yes"?>

123-32 654-64 -645 35-56 14-22 14-23

I get: strusPatternMatcher -m modstrus_analyzer_pattern -K -p program.rul input.xml load program ... start matching ... input.xml: 1 [68] : 1 P 123-32 2 [75] : 1 P 654-64 3 [87] : 1 P 35-56 4 [93] : 1 P 14-22 4 [93] : 16777217 14-22 14-22 5 [99] : 1 P 14-23 R [4..5, 68|25 .. 68|30]: p [4..5, 68|25 .. 68|30] '14-22' OK done

The error message 'symbol defined twice '14-22' makes no sense in any case. Hopefully it will somehow be reproducable.

andreasbaumann commented 6 years ago

My mistake: for the 'symbol defined twice` there is another bug https://github.com/patrickfrey/strusUtilities/issues/57.

Here I wanted to point out that the value in the rule always matches to the full regex. Can I match subexpressions and formulate rule conditions on the subexpressions?

So this is a feature extensions request and a question, if that's simple and/or usable. :-)