Open jacobfriedman opened 2 years ago
IdentifierStart = ID_Start / Pc
IdentifierPart = ID_Continue / Sc
This is also something we should know about.
(* Based on the unicode identifier and pattern syntax
* (http://www.unicode.org/reports/tr31/)
* And extended with a few characters.
*)IdentifierStart = ID_Start
| Sc
| '_'
| '‿'
| 'â€'
| 'â”'
| '︳'
| '︴'
| 'ï¹'
| '﹎'
| 'ï¹'
| '_'
;
(* Based on the unicode identifier and pattern syntax
* (http://www.unicode.org/reports/tr31/)
* And extended with a few characters.
*)IdentifierPart = ID_Continue
| Sc
;
(* Any character except "`", enclosed within `backticks`. Backticks are escaped with double backticks. *)EscapedSymbolicName = { '`', { ANY - ('`') }, '`' }- ;
Also mentioned in https://github.com/opencypher/openCypher/issues/331
Had to add additional rules outside this parser. Would have been nice to source those inside but hey, who wants to include the whole unicode standard :(
Watch out for EOI.
Hi. That is correct. If you are using a Java compatible language, have a look at this class here: https://github.com/opencypher/openCypher/blob/master/tools/grammar/src/main/java/org/opencypher/grammar/CharacterSet.java#L101 It includes the codepoint definitions of character sets used in the grammar.
Thanks. I had sourced those from the Unicode standard.
It would have been helpful to have a comment at the top of the EBNF as it is certainly incomplete.
On Thu., Jul. 21, 2022, 7:58 a.m. Nadja Müller, @.***> wrote:
Hi. That is correct. If you are using a Java compatible language, have a look at this class here:
https://github.com/opencypher/openCypher/blob/master/tools/grammar/src/main/java/org/opencypher/grammar/CharacterSet.java#L101 It includes the codepoint definitions of character sets used in the grammar.
— Reply to this email directly, view it on GitHub https://github.com/opencypher/openCypher/issues/534#issuecomment-1191395728, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFKHXTNEVOTIV35XJCQINDVVE3ODANCNFSM53BKJKRA . You are receiving this because you authored the thread.Message ID: @.***>
Hi @jacobfriedman, do you have a (code) example of what you had to do to fix this? It's not completely apparent to me.
Hi. That is correct. If you are using a Java compatible language, have a look at this class here: https://github.com/opencypher/openCypher/blob/master/tools/grammar/src/main/java/org/opencypher/grammar/CharacterSet.java#L101 It includes the codepoint definitions of character sets used in the grammar.
Also, @nadja-muller, could you clarify what the suggestion is for Java? Are we supposed to leverage (invoke) this class directly or are we using the class as a reference to look up the definitions of ID_Start
and ID_Continue
so as to hardcode them into the ebnf?
I believe this had to be hard-coded, which makes this specification invalid; there is no definition.
It would be nice to include the source, though.
On Thu., Mar. 14, 2024, 9:14 a.m. vincent-karuri, @.***> wrote:
Hi. That is correct. If you are using a Java compatible language, have a look at this class here: https://github.com/opencypher/openCypher/blob/master/tools/grammar/src/main/java/org/opencypher/grammar/CharacterSet.java#L101 It includes the codepoint definitions of character sets used in the grammar.
Also, @nadja-muller https://github.com/nadja-muller, could you clarify what the suggestion is for Java? Are we supposed to leverage (invoke) this class directly or are we using the class as a reference to look up the definitions of ID_Start and ID_Continue so as to hardcode them into the ebnf?
— Reply to this email directly, view it on GitHub https://github.com/opencypher/openCypher/issues/534#issuecomment-1997436126, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFKHXTM5PX4LFTCQCUY4C3YYGPEDAVCNFSM53BKJKRKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJZG42DGNRRGI3A . You are receiving this because you were mentioned.Message ID: @.***>
Unfortunately, trying to parse these raises the question, 'Is there something I should know about"? Please include these references (VT, FF, CR, FS, etc.) in the grammar. Otherwise I can't parse 'SPACE'... or 'TAB'. Granted, these are easy additions on my end but it just doesn't work out-of-the-box.
Even if these were provided at this point, given the issue, the question is still raised unless I try to run the parser again... nothing should be implicit in an EBNF grammar file.
Thank you for the great work!