mathiasbynens / es-regexp-unicode-character-class-escapes

Proposal to improve the character class escape tokens `\d`, `\D`, `\w`, `\W`, and the word boundary assertions `\b` and `\B` in ES6 Unicode regular expressions (with the `u` flag).
http://esdiscuss.org/topic/questions-regarding-es6-unicode-regular-expressions#content-3
12 stars 0 forks source link

`\p{…}` and `\P{…}` spec text #4

Closed ghost closed 7 years ago

ghost commented 8 years ago

claudepache commented 8 years ago

Note that the syntax in 11.8.5 (Regular Expression Literals is just used for recognisinig the end of a regular expression literals /.../ in ES code, and not for analysing it. At first glance, it should not be changed, because \p and \P are properly handled, and { and } do not play special role in that grammar.

The grammar that should be augmented is the one of 21.2.1 Patterns, so that \pX and \p{...} are recognised as Atom. A first sketch:

Atom[U] ::
    \  AtomEscape[?U]
    etc.
AtomEscape[U] ::
    DecimalEscape             <--- this is for \123
    CharacterEscape[?U]       <--- this is for \n, \uXXXX, etc. (a designated character)
    CharacterClassEscape[?U]  <--- this is for \d, \s, etc. (a class of characters)
CharacterClassEscape[U] ::
    d 
    D
    s 
    S 
    w 
    W
    [+U] p someLetter
    [+U] p {  someSequence  }
    [+U] P  someLetter 
    [+U] P {  someSequence  }

where someLetter and someSequence remain to be determined.

mathiasbynens commented 8 years ago

FWIW, ES4 did it very similarly: http://wiki.ecmascript.org/lib/exe/fetch.php?id=spec%3Aspec&cache=cache&media=spec:library-d2.html#RegExp%20grammar

CharacterClassEscape ::
    d                           => charset_digit
    D                           => CharsetComplement( charset_digit )
    s                           => charset_space
    S                           => CharsetComplement( charset_space )
    w                           => charset_word
    W                           => CharsetComplement( charset_word )
    p { UnicodeClass }          => unicodeClass( UnicodeClass )
    P { UnicodeClass }          => CharsetComplement( unicodeClass( UnicodeClass ) )
mathiasbynens commented 8 years ago

First draft: https://github.com/tc39/ecma262/compare/master...mathiasbynens:unicodePropertyEscape?expand=1

I’m mostly looking for feedback on whether I’m writing the spec correctly, not in terms of functionality / features.

Update: Made it into a PR so it’s easier to leave comments on specific lines: https://github.com/mathiasbynens/ecma262/pull/1/files

mathiasbynens commented 7 years ago

https://github.com/tc39/proposal-regexp-unicode-property-escapes