haskell-hvr / regex-tdfa

Pure Haskell Tagged DFA Backend for "Text.Regex" (regex-base)
http://hackage.haskell.org/package/regex-tdfa
Other
36 stars 9 forks source link

dash in range not correctly parsed #1

Closed neongreen closed 2 years ago

neongreen commented 4 years ago

https://github.com/ChrisKuklewicz/regex-tdfa/issues/24, originally reported by @pjljvandelaar


As specified in https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

the expression "[--@]" matches any of the characters between '-' and '@' inclusive

However, ("@" =~ "[--@]") results in

Explict error in module Text.Regex.TDFA.String : 
Text.Regex.TDFA.String died: parseRegex for Text.Regex.TDFA.String failed:"[--@]" (line 1, column 4):
       unexpected A dash is in the wrong place in a bracket
       expecting "]"
       CallStack (from HasCallStack):
         error, called at .\Text\Regex\TDFA\Common.hs:29:3 in regex-tdfa-1.2.3.1-DVMXTrvIFHgDCky8s203W0:Text.Regex.TDFA.Common)
andreasabel commented 2 years ago

The error is also triggered for an empty range, e.g. [1-0]:

ghci> ("1" =~ "[1-0]") :: String
"*** Exception: Explict error in module Text.Regex.TDFA.String : Text.Regex.TDFA.String died: parseRegex for Text.Regex.TDFA.String failed:"[1-0]" (line 1, column 4):
unexpected A dash is in the wrong place in a bracket
expecting "]"
CallStack (from HasCallStack):
  error, called at lib/Text/Regex/TDFA/Common.hs:31:3 in regex-tdfa-1.3.1.3-inplace:Text.Regex.TDFA.Common

I think the intention was to give another error in this case: https://github.com/haskell-hvr/regex-tdfa/blob/85c89c96a206a9706b67b0311c683ec02669b24b/lib/Text/Regex/TDFA/ReadRegex.hs#L130-L137 However, this error is caught by <|>: https://github.com/haskell-hvr/regex-tdfa/blob/85c89c96a206a9706b67b0311c683ec02669b24b/lib/Text/Regex/TDFA/ReadRegex.hs#L118-L119 So we end up with the error triggered by the last alternative: https://github.com/haskell-hvr/regex-tdfa/blob/85c89c96a206a9706b67b0311c683ec02669b24b/lib/Text/Regex/TDFA/ReadRegex.hs#L139-L144

andreasabel commented 2 years ago

The solution to the OP is to simply not further restrict the appearance of - in a bracketed expression. E.g. [--] would be valid, meaning either - or -. So, we delete this check: https://github.com/haskell-hvr/regex-tdfa/blob/85c89c96a206a9706b67b0311c683ec02669b24b/lib/Text/Regex/TDFA/ReadRegex.hs#L141-L143