Closed n8willis closed 2 years ago
Interesting, that's a lot of set theoretic machinery to define this from scratch given that most regular expressions already allow [^a]
, although that's specifically a negated character class that can be interpreted as a shorthand for "the alternation of every possible character excluding 'a'" rather than a true complement operator that applies to arbitrary regexps.
Yeah, sometimes with these Unicode documents I feel like I'm lacking some context that makes them all make clearer sense. E.g., they may be thinking about some particular RE language or system that this is a real improvement for.
Having perused this a bit more, I don't see anything I think would affect shaping-level concerns. Possibly more useful for higher-level text handling.
Unicode TR18 was just updated to add 'not'/'complement' operators that formally distinguish between applying to a string and applying to a particular codepoint.
The point, I think, is that the regular expressions need to be able to express "Codepoint not U+ABCD" in simple fashion but have that not match "literally any string other than U+ABCD". So I do wonder if that would help simplify any of the regular expressions used for syllable or subsequence matching.