Open kad-beekw opened 3 years ago
The pattern is but a simple escaped regex. You need not look into the XSD escaping rules, which I do not see mentioned by SHACL spec
This totally works:
[ sh:pattern "\\S+" ]
@tpluscode Thanks for your response!
You need not look into the XSD escaping rules, which I do not see mentioned by SHACL spec
I see the following trail when I look through the standard:
The main discussion I think is whether XSD 1.0 or XSD 1.1 should be used.
To be honest, I like your regex notation better, since it is a bit simpler :-). However, I can imagine that there is benefit from following the specification. There may be cases in which a regular expression stored in SHACL can be matched and reused in a SPARQL query. (I'm not sure whether this is a good use case, but what I'm getting at is that when the same regex notation is used across SHACL, SPARQL and XSD this may facilitate cross over use cases.)
I must admit I am a little confused myself, not having dug deep before.
You seem correct about how you followed you nose from SHACL to XSD specs. Section 7.1 of XPath seems to suggest that XSD 1.1 should be used, does it?
That said, the examples in SHACL spec to use the simple escaping (it's pretty much just the backslash). And FWIW the section for sh:pattern
says
The values of sh:pattern in a shape are valid pattern arguments for the SPARQL REGEX function.
This is definitely valid SPARQL :)
filter ( regex( ?name, "^\\S+" ) )
@tpluscode Thanks, XSD 1.1 indeed seems to be the intended standard for regex in SHACL (and SPARQL). I do not have enough knowledge of XSD to determine whether \S
is also valid. When I look at the XSD 1.1 standard I can only find the charProp
grammar rule using within \p{...}
or \P{....}
notation:
[85] | catEsc | ::= | '\p{' charProp '}'
[86] | complEsc | ::= | '\P{' charProp '}'
[87] | charProp | ::= | IsCategory \| IsBlock
Maybe \p{S}
is commonly written as \S
in SPARQL? If so, this may be a de facto extension of the XSD 1.1 syntax?
Whatever the case may be, some regex strings that seem to be valid in XSD 1.1 do not seem to be supported by this SHACL library. Maybe this is not so bad: the XSD 1.1 standard is sufficiently unreadable to prevent large groups of users from picking up the regex grammar described in it. Maybe the de facto way of writing regex is more popular.
Thanks for maintaining this great library!
Observation
I'm unable to properly validate place names using
sh:pattern
. Place names may include spaces, single quotes, hyphens, and some non-ASCII Unicode characters. Examples of place names that should succeed are's-Gravenhage
,The Hague
, andKöln
.If I understand the somewhat cryptic XSD standard (link), then this should be expressible in the following way:
But the following data does not validate:
Since many natural languages include characters that do not occur in simple ASCII ranges like
[A-Za-z]
, and because natural language information is very common in RDF data, support for validating Unicode strings insh:pattern
is useful in many cases.Expected
The ability to use category escapes in
sh:pattern
, specifically for natural language content for which simple ranges are difficult/impossible to express.