mhulden / foma

Automatically exported from code.google.com/p/foma
117 stars 90 forks source link

Quantified concatenation using < or > fails when not escaped, different from xfst & hfst #114

Open snomos opened 3 years ago

snomos commented 3 years ago

In writing a URL parser, I have the following lexicon:

LEXICON realdomain
    < [ a | b | c | d | e | f | g | h | i | j | k
      | l | m | n | o | p | q | r | s | t | u | v
      | w | x | y | z | A | B | C | D | E | F | G
      | H | I | J | K | L | M | N | O | P | Q | R
      | S | T | U | V | W | X | Y | Z |%-
      |%0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ]^>1 %. > topdomainlist ;

This fails in Foma with the following error:

***Syntax error on line 49 column 616 at '>'

If I escape the quantifier as follows: ^%>1 the regex compiles in Foma, but fails in both Xfst:

*** Warning: regex_parse: Positive integer expeted, got 0. ***

and Hfst-xfst:

*** xre parsing failed: syntax error, unexpected LEXER_ERROR, expecting end of file
***    parsing […]
      |%_ |%? |%& |%= |%% |%@ |%. |%/ |%~ ]^%>1  [near ^] on line 27...
Unable to parse regular expression