morganstanley / hobbes

A language and an embedded JIT compiler
http://hobbes.readthedocs.io/
Apache License 2.0
1.17k stars 105 forks source link

regex pattern-matching : match implements a search-like behavior, instead of a match #138

Open smunix opened 6 years ago

smunix commented 6 years ago

I noted the following issues with regex pattern matching :

1.- binding the same name from 2 different pattern rows confuses the result :

match "fooooooooobar" with | 'f(?o)bar' -> o | '(?.+)(?o)bar' -> o | -> "none" "" match "fooooooooobar" with | 'f(?o)bar' -> o | '(?.+)(?o)bar' -> od | -> "none" "ooooooooo"

2.- .+ followed by another regex in sequence obliterates the sequenced regexes. Although this behaviour is somewhat correct, a user might expect a stricter matching behaviour to be implemented here. Some regex engines differentiates between matches and searches; is it something we'd benefit from having here too ?

match "fooooooooobar" with | 'f(?o)bar' -> o | '(?.+)(?o)bar' -> h | -> "none" "ooooooooo" match "fooooooooobar" with | 'a(?o)bar' -> o | '(?.+)(?o)bar' -> h | -> "none" "fooooooooobar" match "fooooooooobar" with | 'a(?o)bar' -> o | '(?.+)(?o)bar' -> hd | -> "none" "fooooooooobar" match "fooooooooobar" with | 'a(?o)bar' -> o | '(?.)(?o)bar' -> hd | -> "none" "f" match "fooooooooobar" with | 'a(?o)bar' -> o | '(?.+)(?o)bar' -> hd | _ -> "none" "fooooooooobar"

kthielen commented 6 years ago

Yes, good points about the regex binding code. That support was initially added with just the 2-row either/or match in mind. It's worth a second look.

smunix commented 6 years ago

The culprit seems to be this line of code. I'll work on a fix once I've got some free cycles. It's a minor issue.