I had a problem very similar to the one mentioned here in https://github.com/kach/nearley/issues/543. I'm trying to adapt some of the grammars from the ECMAScript standard. For example, here is (part of) the grammar for IdentifierName:
# https://tc39.es/ecma262/#sec-identifier-names
IdentifierName -> IdentifierStart {% id %}
| IdentifierName IdentifierPart {% xs => xs.join('') %}
IdentifierStart -> IdentifierStartChar {% id %}
IdentifierPart -> IdentifierPartChar {% id %}
IdentifierStartChar -> UnicodeIDStart {% id %}
| "$" {% id %}
| "_" {% id %}
IdentifierPartChar -> UnicodeIDContinue {% id %}
| "$" {% id %}
| ZWNJ {% id %}
| ZWJ {% id %}
ZWNJ -> "\u200C" {% id %}
ZWJ -> "\u200D" {% id %}
UnicodeIDStart -> [\p{ID_Start}] {% id %}
UnicodeIDContinue -> [\p{ID_Continue}] {% id %}
Crucially, UnicodeIDStart and UnicodeIDContinue are defined in terms of the Unicode properties. We need the \p{ID_Start} and \p{ID_Continue} syntax to work in the RegExp-based charclasses; however, to do that, we also need to enable the u flag.
I'm a very new user of Nearley, so I don't know if it's safe to turn this on for everyone, if it should be opt-in, or if it could cause other problems. What do you think? Is this useful?
I had a problem very similar to the one mentioned here in https://github.com/kach/nearley/issues/543. I'm trying to adapt some of the grammars from the ECMAScript standard. For example, here is (part of) the grammar for IdentifierName:
Crucially, UnicodeIDStart and UnicodeIDContinue are defined in terms of the Unicode properties. We need the
\p{ID_Start}
and\p{ID_Continue}
syntax to work in the RegExp-based charclasses; however, to do that, we also need to enable theu
flag.I'm a very new user of Nearley, so I don't know if it's safe to turn this on for everyone, if it should be opt-in, or if it could cause other problems. What do you think? Is this useful?