Incomplete EBNF Grammar

jacobfriedman commented 2 years ago

whitespace = SPACE
           | TAB
           | LF
           | VT
           | FF
           | CR
           | FS
           | GS
           | RS
           | US

Unfortunately, trying to parse these raises the question, 'Is there something I should know about"? Please include these references (VT, FF, CR, FS, etc.) in the grammar. Otherwise I can't parse 'SPACE'... or 'TAB'. Granted, these are easy additions on my end but it just doesn't work out-of-the-box.

Even if these were provided at this point, given the issue, the question is still raised unless I try to run the parser again... nothing should be implicit in an EBNF grammar file.

Thank you for the great work!

jacobfriedman commented 2 years ago

IdentifierStart = ID_Start / Pc
IdentifierPart = ID_Continue / Sc

This is also something we should know about.

jacobfriedman commented 2 years ago


(* Based on the unicode identifier and pattern syntax
 *   (http://www.unicode.org/reports/tr31/)
 * And extended with a few characters.
 *)IdentifierStart = ID_Start
                | Sc
                | '_'
                | 'â€¿'
                | 'â€'
                | 'â”'
                | 'ï¸³'
                | 'ï¸´'
                | 'ï¹'
                | 'ï¹Ž'
                | 'ï¹'
                | 'ï¼¿'
                ;
(* Based on the unicode identifier and pattern syntax
 *   (http://www.unicode.org/reports/tr31/)
 * And extended with a few characters.
 *)IdentifierPart = ID_Continue
               | Sc
               ;
(* Any character except "`", enclosed within `backticks`. Backticks are escaped with double backticks. *)EscapedSymbolicName = { '`', { ANY - ('`') }, '`' }- ;

from https://github.com/paul-english/nom-ebnf/blob/83e6b84c300c653b5aa315152bbfd2a44d9a671b/src/cypher.ebnf

jacobfriedman commented 2 years ago

Also mentioned in https://github.com/opencypher/openCypher/issues/331

jacobfriedman commented 1 year ago

Had to add additional rules outside this parser. Would have been nice to source those inside but hey, who wants to include the whole unicode standard :(

Watch out for EOI.

nadja-muller commented 1 year ago

Hi. That is correct. If you are using a Java compatible language, have a look at this class here: https://github.com/opencypher/openCypher/blob/master/tools/grammar/src/main/java/org/opencypher/grammar/CharacterSet.java#L101 It includes the codepoint definitions of character sets used in the grammar.

jacobfriedman commented 1 year ago

Thanks. I had sourced those from the Unicode standard.

It would have been helpful to have a comment at the top of the EBNF as it is certainly incomplete.

On Thu., Jul. 21, 2022, 7:58 a.m. Nadja Müller, @.***> wrote:

Hi. That is correct. If you are using a Java compatible language, have a look at this class here:

https://github.com/opencypher/openCypher/blob/master/tools/grammar/src/main/java/org/opencypher/grammar/CharacterSet.java#L101 It includes the codepoint definitions of character sets used in the grammar.

— Reply to this email directly, view it on GitHub https://github.com/opencypher/openCypher/issues/534#issuecomment-1191395728, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFKHXTNEVOTIV35XJCQINDVVE3ODANCNFSM53BKJKRA . You are receiving this because you authored the thread.Message ID: @.***>

vincent-karuri commented 3 months ago

Hi @jacobfriedman, do you have a (code) example of what you had to do to fix this? It's not completely apparent to me.

vincent-karuri commented 3 months ago

Hi. That is correct. If you are using a Java compatible language, have a look at this class here: https://github.com/opencypher/openCypher/blob/master/tools/grammar/src/main/java/org/opencypher/grammar/CharacterSet.java#L101 It includes the codepoint definitions of character sets used in the grammar.

Also, @nadja-muller, could you clarify what the suggestion is for Java? Are we supposed to leverage (invoke) this class directly or are we using the class as a reference to look up the definitions of ID_Start and ID_Continue so as to hardcode them into the ebnf?

jacobfriedman commented 3 months ago

I believe this had to be hard-coded, which makes this specification invalid; there is no definition.

It would be nice to include the source, though.

On Thu., Mar. 14, 2024, 9:14 a.m. vincent-karuri, @.***> wrote:

Hi. That is correct. If you are using a Java compatible language, have a look at this class here: https://github.com/opencypher/openCypher/blob/master/tools/grammar/src/main/java/org/opencypher/grammar/CharacterSet.java#L101 It includes the codepoint definitions of character sets used in the grammar.

Also, @nadja-muller https://github.com/nadja-muller, could you clarify what the suggestion is for Java? Are we supposed to leverage (invoke) this class directly or are we using the class as a reference to look up the definitions of ID_Start and ID_Continue so as to hardcode them into the ebnf?

— Reply to this email directly, view it on GitHub https://github.com/opencypher/openCypher/issues/534#issuecomment-1997436126, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFKHXTM5PX4LFTCQCUY4C3YYGPEDAVCNFSM53BKJKRKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJZG42DGNRRGI3A . You are receiving this because you were mentioned.Message ID: @.***>

opencypher / openCypher

Incomplete EBNF Grammar #534