Open gkellogg opened 2 years ago
While we're dealing with this, I think we can have a bit more sanity checking by saying that the exclusions have to be homogeneous. As a counter example. consider
foo:code [. # any RDF term...
- 'a'~ - 'e'~ # ... except strings starting with 'a' or 'e'
- @en-UK~ - @fr~ # ... or British or French RDF langStrings (regardless of region, script, etc.)
]
Would it permit this?:
<s> foo:code <http://a.example> .
The grammar would imply that it does but in ShExJ, we see that exclusions are typed, e.g. LiteralStemRange
and LanguageStemRange
in:
{ "type": "TripleConstraint",
"predicate": "...code",
"valueExpr": {
"type": "NodeConstraint",
"values": [
{ "type": "LiteralStemRange",
"stem": { "type": "Wildcard" },
"exclusions": [
{ "type": "LiteralStem", "stem": "a" },
{ "type": "LiteralStem", "stem": "e" }
] },
{ "type": "LanguageStemRange",
"stem": { "type": "Wildcard" },
"exclusions": [
"en-UK",
"fr"
] }
] } }
With homogenous exclusions, we can reflect the ShExJ. You could still state the above, but you'd need two terms in the valueSet:
foo:code [
. -'a'~ -'e'~ # any string, except one starting with 'a' or 'e'
. -@en-UK~ -@fr~ # none of them Britishisms, and nothing French
]
Here's the grammar that ShExJS uses (which passes the tests):
valueSetValue: iriRange | literalRange | languageRange
| '.' (iriExclusion+ | literalExclusion+ | languageExclusion+)
iriRange: iri ('~' iriExclusion*)?
iriExclusion: '-' iri '~'?
literalRange: literal ('~' literalExclusion*)?
literalExclusion: '-' literal '~'?
languageRange:
LANGTAG ('~' languageExclusion*)?
| '@' '~' languageExclusion*
languageExclusion: '-' LANGTAG '~'?
Which lines up with https://github.com/shexSpec/grammar/blob/master/ShExDoc.g4#L149-L161.
PROPOSE: adopt the ANTLR productions for valueSetValue,
That seems reasonable, although I'll need to implement it for myself to be sure.
As noted in https://lists.w3.org/Archives/Public/public-shex/2021Aug/0001.html:
In ShEx 2.0, the productions were defined as follows:
In ShEx 2.1, they were updated to the following:
But, the note on [49] still notes "If "." matches and exclusion matches one or more times”, and that doesn’t make sense in this context. Also, the third ValuesConstraint example has a ‘.’ only at the beginning:
Looks like the changes were made in error? Certainly, the new grammar is not forward-compatible with 2.0.