haskell-hvr / regex-tdfa

Pure Haskell Tagged DFA Backend for "Text.Regex" (regex-base)
http://hackage.haskell.org/package/regex-tdfa
Other
38 stars 10 forks source link

Brackets inside a Bracket Expression are not handled properly #30

Closed drsooch closed 2 years ago

drsooch commented 2 years ago

Attempting to add brackets inside a bracket expression seems to act differently based on the ordering. I spent a majority of my time attempting to escape the brackets (\\[) with no luck. Below I've provided a few variations on the same expression showcasing the issue.

I'm not extremely familiar with the various flavors of Regex, but this doesn't seem correct.

*ghci> ("Expected type: [Int] -> Int" :: Text) =~ "Expected type: ([[]a-zA-Z0-9 ->]+)" :: Bool
False
*ghci> ("Expected type: [Int] -> Int" :: Text) =~ "Expected type: ([][a-zA-Z0-9 ->]+)" :: Bool
True
*ghci> ("Expected type: [Int] -> Int" :: Text) =~ "Expected type: ([\\[\\]a-zA-Z0-9 ->]+)" :: Bool
False
andreasabel commented 2 years ago

It seems like inside [...], the escaping via \ does not work. A backslash inside brackets is just the backslash character. So, you cannot put ] inside brackets, as it closes the brackets. (Putting ] inside is fine.)
There is one exception though: You can put ] as the first character inside brackets, because empty brackets are not allows (parse error). This is why your second try works:

ghci> "Expected type: [Int] -> Int" =~ "^Expected type: ([][a-zA-Z0-9 ->]+)$" :: Bool
True

Does this help?

P.S.: https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions#Character_classes says:

The ] character can be included in a bracket expression if it is the first character:

drsooch commented 2 years ago

It does thanks!

Curious, shouldn't escaping work inside brackets?

andreasabel commented 2 years ago

@drsooch

Curious, shouldn't escaping work inside brackets?

I suppose so, but is it universally the case? What would be the references?

Btw, I wonder why the regex-* package family does not expose an abstract syntax for regexes (regex expression trees). It would maybe tedious to write down the ASTs, but at least you would not have to battle with escaping and hard-to-comprehend parsers for regexes...

drsooch commented 2 years ago

I just noticed the link you pasted and took a look. Also looks like there is a specific example of escaped brackets (granted they are not being used inside a bracket expression). Thanks again!