sjshuck / hs-pcre2

Complete Haskell binding to PCRE2
Apache License 2.0
12 stars 2 forks source link

`\R` not able to match U+2028 or U+2029 #26

Closed sjshuck closed 2 years ago

sjshuck commented 2 years ago

In text-2 branch.

matchesOpt (Bsr BsrUnicode) "\\R" "\x2028"

In UTF-16 builds it used to work. In UTF-8 mode, it doesn't. However, narrower characters like \f are matched. Looking at the C code, both \f and \x2028 should be matched if PCRE2_BSR_UNICODE is set and the macro EBCDIC is undefined.

This is a lower priority bug but may indicate deeper issues, like wide UTF-8 characters (2+ bytes) being mishandled.