tc39 / proposal-regexp-unicode-sequence-properties

Proposal to add support for sequence properties in Unicode property escapes to ECMAScript regular expressions.
https://github.com/tc39/proposal-regexp-set-notation
72 stars 13 forks source link

Should the regular expression engine be required to validate the character? #11

Closed nathanhammond closed 6 years ago

nathanhammond commented 6 years ago
mathiasbynens commented 6 years ago

This proposal defers to the Unicode Standard for the definitions of each of the sequence properties, just like for existing property escapes. See UTR51 which refers to the data files for each property.

nathanhammond commented 6 years ago

This is proposing that this defer to Unicode's set, implying that the regular expression engine would be required to validate each sequence.

This proposal does not specify how an engine should maintain the list of valid sequences. This is a chance for cross-platform divergence in behavior (possibly even within the same engine), somewhat similar to the Date issues that Microsoft has long faced. I'd be much more comfortable with a specific plan as to how engines should update and maintain this list going forward.

How should this interact with Node's LTS policies?

mathiasbynens commented 6 years ago

This proposal does not specify how an engine should maintain the list of valid sequences.

It does not need to, as the ECMAScript spec already codifies this. The latest version of the Unicode Standard is required (https://github.com/tc39/ecma262/pull/620).

Once this proposal matures, I'll update https://github.com/mathiasbynens/unicode-property-escapes-tests which generates the Test262 tests for Unicode property escapes to include sequence property tests. These tests will be updated whenever the Unicode Standard gets an update. A tc39/ecma262 issue will be filed for every such update detailing the changes.

I don't see how this is different from any other Unicode-related change. Am I misunderstanding your feedback?