Open flip111 opened 5 years ago
I believe it is the Pc
and Sc
classes that are referred to when it says a few characters
. So it's ID_Start
plus Pc
for IdentifierStart
and ID_Continue
plus Sc
for IdentifierPart
.
You can check this code for the model's definition of these constructs (derived from Unicode):
Will it be acceptable to change anything in the comments of the xml ?
Sure, but please note that in order to accept contributions there is a CLA that must be signed.
Ah ok, thank you for the notice. I won't sign anything so i won't make PR's here. If this isn't important for anyone else, this issue can be closed. Hopefully for other people they will be able to find this issue to get the information that is not (yet) in the xml ..
I'll keep it open for a bit to see if someone (like me) decides to improve the comment, and will close at that point (or before if nothing happens).
I'm looking at this https://github.com/opencypher/openCypher/blob/a514465ca5f1f975985c28d1c9db03c111791671/grammar/basic-grammar.xml#L718-L740 well actually the EBNF version that can be downloaded from the website (The legacy one).
How should i implement this? When you look for unicode ID_Start there a jungle of documentation. But i only need the regex of this. Fortunately unicode has made a tool for this https://unicode.org/cldr/utility/regex.jsp?a=%5B%3AID_Start%3A%5D&b=
That leaves the question what about
And extended with a few characters
?So perhaps to clear up this section of the spec some of the following could be done:
I think the unicode tool gives the regular expression in some standard regex format, but there can be other engines like PCRE which can define other shorthand classes which could be used instead. Therefor it can be useful to include regular expressions in comments.