shexSpec / shexTest

ShEx test suite
https://shexspec.github.io/shexTest/
Other
3 stars 4 forks source link

Is 1bnodeRefORRefMinlength.shex valid ShEx? #2

Closed gkellogg closed 7 years ago

gkellogg commented 7 years ago

Trying to parse 1bnodeRefORRefMinlength:

<http://a.example/S1> {
   <http://a.example/p1> BNODE @<http://a.example/S1> OR MINLENGTH 5 @<http://a.example/S1>
}

The MINLENGTH 5 @<http://a.example/S1> does not look like a valid Atom. From the grammar:

[29]    shapeAtom      ::=      "LITERAL" (xsFacet)*
                                      | nonLiteralKind (stringFacet)* (shapeOrRef)?
                                      | datatype (xsFacet)*
                                      | shapeOrRef (stringFacet)*
                                      | valueSet
                                      | "(" shapeExpression ")"
                                      | "."

It seems that it would need to be @<http://a.example/S1> MINLENGTH 5 to match shapeOrRef (stringFacet)*; if it had a nonLiteralKind, the facet can precede the shapeRef. Am I missing something?

gkellogg commented 7 years ago

Also, 1datatypeRef1 doesn't parse for me, and Yacker also fails to parse it.

ericprud commented 7 years ago

I think this is now fine per Yacker (see 1bnodeRefORRefMinlength and 1datatypeRef1). Harold also checked it against the ANTLR grammar. (Do you want to use that for RDF.rb? It's currently used for Labra's Java implementation and Harold's python implementation so it's pretty well exercised.)

If you were using the yacker ShEx2 grammar; it was messed up and I apologize profusely. I have since updated it with the grammar from the spec with two extra reservations for 2.1 or 2.2 features:

# includeSets are for inheritance:
- [30] shapeDefinition ::= (extraPropertySet | "CLOSED")* "{" tripleExpression? "}" annotation* semanticActions
- [31] inlineShapeDefinition ::= (extraPropertySet | "CLOSED")* '{' tripleExpression? '}'
+ [30] shapeDefinition ::= (includeSet | extraPropertySet | "CLOSED")* "{" tripleExpression? "}" annotation* semanticActions
+ [31] inlineShapeDefinition ::= (includeSet | extraPropertySet | "CLOSED")* '{' tripleExpression? '}'
+ [57] includeSet ::= "&" shapeLabel+

# current semantics don't include negation on TripleConstraints (see negation branch)
- [45] senseFlags ::= "^"
+ [45] senseFlags ::= "!" "^"? | "^" "!"?
gkellogg commented 7 years ago

Okay, I've been using an out of date grammar, apparently. I'm parsing most of the schema files so far, and I expect I'll likely run into more issues with the grammar using my LL(1) parser generator. I'll update to the new version once I've finished my first pass. Is https://www.w3.org/2005/01/yacker/uploads/ShEx2/bnf the most current location? I was using the one at https://github.com/shexSpec/grammar. Should that be current (or made current)?

FWIW, these are the changes I made to the old grammar to get past first/first conflicts:

# These productions cause conflicts. The substituted is close enough.
# [10]    someOfShape           ::= groupShape | multiElementSomeOf
# [10a]   multiElementSomeOf    ::= groupShape ('|' groupShape)+
# [10b]   innerShape            ::= multiElementGroup | multiElementSomeOf
[10]    someOfShape           ::= groupShape ('|' groupShape)*

# These productions cause conflicts. The substituted is equivalent, except for allowing
# an arbitrary number of trailing ';'
# [11]    groupShape            ::= singleElementGroup | multiElementGroup
# [11]    singleElementGroup    ::= unaryShape ';'?
# [11a]   multiElementGroup     ::= unaryShape (';' unaryShape)+ ';'?
[11]    groupShape            ::= unaryShape (';' unaryShape?)*

# The following cause conflicts. The new one is pretty much the same.
#[12]    unaryShape            ::= productionLabel? tripleConstraint | include | productionLabel? encapsulatedShape
[12]    unaryShape            ::= productionLabel? (tripleConstraint | encapsulatedShape) | include

# The following cause conflicts. The new one is pretty much the same.
# [13]    encapsulatedShape     ::= '(' innerShape ')' cardinality? annotation* semanticActions
[13]    encapsulatedShape     ::= '(' groupShape ')' cardinality? annotation* semanticActions
ericprud commented 7 years ago

~ shexSpec/grammar/bnf updated. The accompanying ANTLR grammar should be up to date.

gkellogg commented 7 years ago

I've updated to this grammar, and generate S-Expressions for all schema files other than _all and kitchenSink. I ended up making some small changes to the grammar. My version is here.

Rule 6 uses a capitalized version of 'START', which is specific to my parser (although it does accept arbitrarily capitalized input):

# "START" easier for parser than "start"
[6]     start                 ::= "START" '=' shapeExpression

Rules 34-36 have the noted LL(1) errors:

# First/First Conflicts on "&", "$", "(", "^", :RDF_TYPE , :IRIREF, :PNAME_LN and :PNAME_NS 
#[34]    someOfTripleExpr      ::= groupTripleExpr | multiElementSomeOf
#[35]    multiElementSomeOf    ::= groupTripleExpr ('|' groupTripleExpr)+
#[36]    innerTripleExpr       ::= multiElementGroup | multiElementSomeOf
[34]    someOfTripleExpr      ::= groupTripleExpr ('|' groupTripleExpr)*

The replaced rule 34 avoids these problems, and makes the grammar simpler.

Same for rules 37-39:

# First/First Conflicts on "&", "$", "(", "^", :RDF_TYPE , :IRIREF, :PNAME_LN and :PNAME_NS 
# First/Follow Conflict on ";"
#[37]    groupTripleExpr       ::= singleElementGroup | multiElementGroup
#[38]    singleElementGroup    ::= unaryTripleExpr ';'?
#[39]    multiElementGroup     ::= unaryTripleExpr (';' unaryTripleExpr)+ ';'?
[37]    groupTripleExpr       ::= unaryTripleExpr (';' unaryTripleExpr?)*

This also required changing rule 41 to avoid the use of a removed production:

# Use someOfTripleExpr instead of innerTripleExpr
#[41]    bracketedTripleExpr   ::= '(' innerTripleExpr ')' cardinality? annotation* semanticActions
[41]    bracketedTripleExpr   ::= '(' someOfTripleExpr ')' cardinality? annotation* semanticActions

You can find S-Expression versions of all transformed schema files here. Of course, I'm not actually executing these yet, but they seem to have the same structure as the JSON files, for all practical purposes.

Closing this issue, as the new grammar fixed the originally reported issues.