shexSpec / shex

ShEx language issues, including new features for e.g. ShEx2.1
25 stars 8 forks source link

Datatype example uses non-sparql11 datatype #81

Closed hsolbrig closed 6 years ago

hsolbrig commented 6 years ago

Example 1 in 5.4.3 Datatype Constraints has a pass and a fail for xsd:date. The spec asserts "Only datatypes supported by SPARQL MUST be tested but ShEx extensions MAY add support for other datatypes. " and, as xsd:date is not supported by SPARQL, the results of the test are unspecified.

ericprud commented 6 years ago

I believe new wording is required here. The behavior required by the test suite is actually to support datatypes not listed in SPARQL, but not to require any type inference about them. This is consistent with SPARQL which specifies behavior for e.g. "iv"^^my:romanNumeral.

ericprud commented 6 years ago

We need to enumerate for which datatypes ShEx requires validation of the lexical form. There are currently tests for lexical conformance with the subset of XSD datatypes covered by SPARQL1.1 and numeric range conformance with the subset of derived numeric types covered by SPARQL1.1.

The SPARQL1.1 derived types excludes types derived from xsd:string. These aren't useful in RDF:

  1. normalizedString - a string with no TABs or LFs.
  2. token - a string with no TABs, LFs or leading or trailing SPACEs.
  3. language - RFC3066 language tag (explicitly NOT a datatype in RDF).
  4. NMTOKEN - an XML identifier string compose of the middle characters in ShEx/SPAQL local names.
  5. NMTOKENS - space separated list of one or more NMTOKENs.
  6. Name - ShEx/SPAQL local names without the restriction on trailing '.'.
  7. NCName - "non-colon name", i.e. Name - ':'
  8. ID - XML attribute; a subclass of NCName.
  9. IDREF - another subclass of NCName.
  10. IDREFS - SPACE-separated list of IDREF.
  11. ENTITY - yet another subclass of NCName.
  12. ENTITIES - SPACE-separated list of ENTITY.

The SPARQL1.1 primative types excludes:

  1. anyType - effectively and abstract type; not useful for annotating literals.
  2. anySimpleType - also abstract.
  3. duration - dateTime differences expressed as 'P' + 'nY' (years) + 'nM' (months) + 'nD' (days) + 'T' + 'nH' (hours) + 'nM' (minutes) + 'n.nn…S' (seconds), e.g. P1Y2MT2H.
  4. time - everything to the right of 'T' in a dateTime.
  5. date - everything to the left of 'T' in a dateTime.
  6. g{YearMonth,Year,MonthDay,Day,Month} - pars of a dateTime.
  7. base64Binary - arbitrary byte sequences expressed according to RFC2045.
  8. hexBinary - arbitrary byte sequences expressed as upper or lower-case hex.
  9. anyURI - right or wrong, literal URLs (e.g. someone's homepage) are typically expressed as RDF IRIs.
  10. QName - an XML prefixed name.
  11. Notation - QNames agiain.

The RDF model, amd most serializations of RDF (not RDF/XML) include the character \U0000 so the base64Binary and hexBinary datatypes aren't needed.

ericprud commented 6 years ago

PROPOSED: ShEx validates lexical forms of:

  1. the subset of XSD primitive types supported by SPARQL1.1 plus xsd:date and xsd:time.
  2. the subset of XSD derived numeric types supported by SPARQL1.1.
jimkont commented 6 years ago

Resolved with votes from @ericprud, @labra, @gkellogg, @hsolbrig, @emulatingkat, @larsgsvensson, @jimkont, @tombaker, and @lucaswerkmeister that reacted with thumbs up on https://github.com/shexSpec/shex/issues/81#issuecomment-368446352

Action before we close: Update the spec accordingly

ericprud commented 6 years ago

added this to § 5.4.3 Datatype Constraints:

The lexical form and numeric value (where applicable) of all datatypes required by SPARQL XPath Constructor Functions MUST be tested for conformance with the corresponding XML Schema form. ShEx extensions MAY add support for other datatypes.