patrickfrey / strusUtilities

A set of command line programs to access the strus information retrieval engine
http://www.project-strus.net
Mozilla Public License 2.0
3 stars 0 forks source link

strusPatternMatcher: defining two matching strings in rules #57

Closed andreasbaumann closed 6 years ago

andreasbaumann commented 6 years ago

I have the following:


P1 = /xxx/;
P2 = /yyy/;

Then two rules:

R1 = any( p1 = P1 "abc" );
R2 = any( p2 = P2 "cde" );

each rule R1, R2 individually works, enabling both leads to

symbol defined twice "cde"
andreasbaumann commented 6 years ago

I find counter examples, where it works. Seems to be completly undeterministic!

andreasbaumann commented 6 years ago

Found an example similar to the original one:

test:rul

EMAIL : /[a-zA-Z0-9.]+@[a-zA-Z]+\.[a-zA-Z]+/;

PHONE : /\+1 +[0-9]{3,4} *[0-9]{3,4} *[0-9]{3,4}/;

Phone = any( phone = PHONE "+1 234 234 234" );
Phone = any( phone = PHONE "+1 333 333 333" );

Email = any( email = EMAIL "theone@i.want" );

test0.txt:

test1:txt:

24234-2342342
234234
14-22
+1 234 234 234
+1 333 333 333
+1 433 553 433
blabla@blublu.bli
theone@i.want
strusPatternMatcher -C text/plain -m modstrus_analyzer_pattern -F -K -p test.rul tests
load program ...
ERROR failed to load program: failed to load pattern match program: failed to define regular expression pattern symbol: symbol defined twice: 'theone@i.want'

If I comment out both 'Phone', then it works. Also if I comment out the 'Email'.

andreasbaumann commented 6 years ago

file tests is a file list for -F:

test0.txt
test1.txt
patrickfrey commented 6 years ago

Fixed in pattern matcher: Could only define one type of symbol