Closed milseman closed 2 years ago
@hamishknight do you think you could throw a few tests together and implement the second alternative?
@milseman Sure, how do we want to classify a metacharacter scalar? Should we limit it to [](){}^$.|\
for now, or just any non-letter non-number non-\r\n
ASCII? I'm inclined to say the latter, what do you think?
Hmm, we also have contextual meta-characters as well as escapes. If we have try Regex("\\d\u{301}")
what should we do?
[^\u{301}a-f]
(?<\u{301}name>.)
(?>\u{301}.)
(where the \u{301}
is not a regex literal scalar but a string literal scalar)
@milseman I believe that's already rejected due to being an unknown letter escape:
Great! I don't know of an organic case for ASCII [^a-zA-Z0-9]
with a combining scalar or anything like that. It seems like a good first approximation and we can relax it in the future. Alternatively, we could do it for all meta or contextually-meta characters.
There's also a canonical equivalent for ?
, IIRC. @Azoy do you recall the details?
There's also a canonical equivalent for ?, IIRC. @Azoy do you recall the details?
I believe there are scalars who have compatibility equivalents with ?
, but I don't believe there are any canonical ones.
The regex parser is based over
Character
, which is fine, but means that some programs could put combining scalars following meta characters and those will not compare equal. We have a few options:The latter seems simpler and there's an easy (and highly advisable!) fall back path of representing the combining scalar through an escape.