Open shellscape opened 2 years ago
Additionally, this package cannot parse the EBNF grammar that railroad shows on its site:
import { Grammars } from 'ebnf';
const w3grammar = `Grammar ::= Production*
Production ::= NCName '::=' ( Choice | Link )
NCName ::= [http://www.w3.org/TR/xml-names/#NT-NCName]
Choice ::= SequenceOrDifference ( '|' SequenceOrDifference )*
SequenceOrDifference ::= (Item ( '-' Item | Item* ))?
Item ::= Primary ( '?' | '*' | '+' )*
Primary ::= NCName | StringLiteral | CharCode | CharClass | '(' Choice ')'
StringLiteral ::= '"' [^"]* '"' | "'" [^']* "'"
/* ws: explicit */
CharCode ::= '#x' [0-9a-fA-F]+
CharClass ::= '[' '^'? ( Char | CharCode | CharRange | CharCodeRange )+ ']'
Char ::= [http://www.w3.org/TR/xml#NT-Char]
CharRange ::= Char '-' ( Char - ']' )
CharCodeRange ::= CharCode '-' CharCode
Link ::= '[' URL ']'
URL ::= [^#x5D:/?#]+ '://' [^#x5D#]+ ('#' NCName)?
Whitespace ::= S | Comment
S ::= #x9 | #xA | #xD | #x20
Comment ::= '/*' ( [^*] | '*'+ [^*/] )* '*'* '*/'`;
const rules = Grammars.W3C.getRules(w3grammar);
This also fails with throw new Error('Could not parse ' + source);
at the same line and position.
Hello, Can you try ending thr document/grammar string with a line ending char?
Your Char production looks hosed:
Char ::= [http://www.w3.org/TR/xml#NT-Char]
(A URL doesn't belong in a bracket expression.)
@kjhughes that's straight from W3C
The RHS is clearly meant to be metadata / documentation, not an EBNF regex. The URL references this EBNF:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
@menduz : Just tried adding a newline at the end and that seemed to do the trick!
Might be worthwhile to not fail on no final newline character?
I've tried adding a newline and still not having any success. Also been trying to parse https://github.com/messagetemplates/grammar/blob/master/message-template.ebnf without success.
Yes, adding a new line on the end of a string is a great tip! Additionally, even though the parser only give you a yes/no as to whether is parsed successfully or not, you can quickly narrow down the problem in the playground
https://menduz.github.io/ebnf-highlighter/
by starting with just one line at a leaf or your parse tree and building your ebnf file back up from there.
e.g. does this parse?
_LETTER-OR-DIGIT ::= [A-Za-z0-9]
No. How about this?
_LETTERORDIGIT ::= [A-Za-z0-9]
No. How about now?
LETTERORDIGIT ::= [A-Za-z0-9]
Yes. So does W3C EBNF not support an NCName entity starting with an underscore? Well, let's look at the node-ebnf source code, this is the top of W3CEBNF.ts
// https://www.w3.org/TR/REC-xml/#NT-Name
// http://www.bottlecaps.de/rr/ui
// Grammar ::= Production*
// Production ::= NCName '::=' Choice
// NCName ::= [http://www.w3.org/TR/xml-names/#NT-NCName]
// Choice ::= SequenceOrDifference ( '|' SequenceOrDifference )*
// SequenceOrDifference ::= (Item ( '-' Item | Item* ))?
// Item ::= Primary ( '?' | '*' | '+' )?
// Primary ::= NCName | StringLiteral | CharCode | CharClass | '(' Choice ')'
// StringLiteral ::= '"' [^"]* '"' | "'" [^']* "'"
// CharCode ::= '#x' [0-9a-fA-F]+
// CharClass ::= '[' '^'? ( RULE_Char | CharCode | CharRange | CharCodeRange )+ ']'
// RULE_Char ::= [http://www.w3.org/TR/xml#NT-RULE_Char]
// CharRange ::= RULE_Char '-' ( RULE_Char - ']' )
// CharCodeRange ::= CharCode '-' CharCode
// RULE_WHITESPACE ::= RULE_S | Comment
// RULE_S ::= #x9 | #xA | #xD | #x20
// Comment ::= '/*' ( [^*] | '*'+ [^*/] )* '*'* '*/'
That tells us to look it up here: http://www.w3.org/TR/xml-names/#NT-NCName
click through to the Name: https://www.w3.org/TR/REC-xml/#NT-Name
click through to the NameStartChar: https://www.w3.org/TR/REC-xml/#NT-NameStartChar
Oh dear, it does look to me like you're supposed to be able to start an NCName entity with an underscore. So it does seem a shame that node-ebnf won't parse this. But hopefully what I've been able to demostrate about how I would isolate a fault and investigate the cause is helpful?
The grammar located here https://github.com/transpect/css-tools/blob/master/ebnf-scheme/CSS3.ebnf is valid W3C EBNF, as verified on railroad https://bottlecaps.de/rr/ui. This package throws an error that it could not parse the grammar at /node_modules/ebnf/dist/Grammars/W3CEBNF.js:288:19.
So it looks like there are some compatibility issues. Perhaps the grammar for W3C is out of date, given the age of the package?