Closed larsgw closed 2 years ago
Important: this also removes 0x7F
, but I can put that back
Related note: due to the requirement for the document to be UTF-8, unicode
potentially is meant to refer to Unicode Scalar Values; that is, [\0-\u{D7FF}\u{E000}-\u{10FFFF}]
, not [\0-\u{10FFFF}]
(see #207).
Moving this to target the v2 branch since it includes breaking changes.
Related note: due to the requirement for the document to be UTF-8, unicode potentially is meant to refer to Unicode Scalar Values; that is, [\0-\u{D7FF}\u{E000}-\u{10FFFF}], not [\0-\u{10FFFF}] (see #207).
Since the requirement for UTF-8 already exists, this distinction is moot; you can't validly encode the surrogate codepoints into UTF-8 anyway.
This change (almost certainly unintentionally) would allow non-ASCII linespace as valid ident chars, which I think would be a bad idea. (Manually resolving a subtraction away can be tricky!)
I'll try to fix it when I can.
I'd prefer this not be merged. It currently changes the grammar in two ways, but even when those are fixed, I personally found the minus syntax perfectly readable and easy to translate to code. This PR doesn't remove all the minuses, anyway - we're left with - keyword
.
More importantly, tho, if we fix #200 with #241 in v2, then this won't be complete anyway, and will need further non-trivial revision. The #241 fix, on the other hand, uses minus more heavily, and imo remains very readable.
Fix #191