Use of schema start vs shapeMap

gkellogg commented 7 years ago

The spec does not really layout success criteria for a schema. There seem to be two scenarios:

1) The schema is invoked with a shapeMap, which associates nodes within a target graph with one or more shape labels. This succeeds when each node evaluates its associated shapes, which all are valid (see isValid). 2) The schema has a start, which either identifies a shape or is an inline shape. It is unspecified which nodes are to be run against this shape to determine validity.

Within tests, the convention has been to pass a focus node and a shape label; in the case of a start, the focus node is used against that shape. The manifest entry also may list a shape label, in which a shapeMap is composed using the focus node and the shape label. Unfortunately, the spec provides not guidance about this; IMHO, the spec MUST describe how nodes are associated with start.

The proposal @ericprud and I worked out was the following:

start is ShExC syntactic sugar for an entry in shapes where, if it is not a shape reference, the label is shex:Start (which MUST NOT be used explicitly in the schema definition).
schema evaluation MAY take one or more focus nodes, if present, there MUST be a start, and the schema isValid if all focus nodes are valid against the start shape by adding an entry to shapeMap for each focus node associated with the start shape.
In general, a schema isValid if all semantic actions succeed, all focus nodes are valid against the start shape, and all nodes in the shapeMap are valid against their associated shapes.
Undiscussed, but necessary to consider, is if start is a shape label. IMO, this is treated as above, but using the referenced shape label, rather than shex:Start.
In the abstract syntax (ShExJ/RDF), start is subsumed within shapes (although how this works when start is a shape label needs to be worked out).

ericprud commented 7 years ago

After our discussion, I reallized that lots of schemas will have a shex:Start shape expr, which kinda breaks SemWeb identifier rules.

gkellogg commented 7 years ago

Main thing is to clarify the node(s) used with the start expression.

ericprud commented 7 years ago

I think most use cases involve an application having a set of nodes and the out of band knowledge that they must match start. ShEx's start used to be a shape label and we extended it for such a use case c.f. #9. It's possible that we picked the wrong answer and, instead of adding expressivity to start, we should have injected a shape into those schemas a la:

start=_:start
_:start <Obs> OR <Patient> OR <Procedure>

This is more chatty as it requires an intermediate shape expression identifier but it also makes it clear that start is simply a default that comes with the schema to. XML Schema survives without such an identifier but with the cost that protocols that use XML Schema, e.g. WSDL, have to maintain pairings of a schema and a start type. An example use is the schema for the ~190 FHIR resources (e.g. DiagnosticReport) which uses the start to stipulate some constraints on the starting shape.

Option 1: change start to be a reference and change deployed schemas with a complex start expression to invent an identifier for that shape expression.

Option 2: Add a paragraph to 4.1 Shapes Schema a la:

start is an extra shape expression which may be used by an application for which invocation parameters may supply some number of nodes without associated shapes.

and some tests that include initial ShapeMaps with a specific convention used in the test suite, e.g. - start -, to indicate that those nodes are tested against the start shape, for example:

    mf:action [
      sht:schema <../schemas/0.shex> ;
      sht:data <empty.ttl> ;
      sht:shapeMap
        [ sht:focus <http://a.example/n1>; sht:shape <http://a.example/S1> ],
        [ sht:focus <http://a.example/n2>; sht:shape "- start -" ],
        [ sht:focus <http://a.example/n3>; sht:shape <http://a.example/S2> ]
    ] ;

gkellogg commented 7 years ago

All things to considered, I'd favor Option 2, but I don't see the need to include "- start -" in the ShapeMap, as you already say it is invoked with parameters that supply some number of nodes without associated shapes.

So, on execution, if such nodes are identified, there MUST be a start shape, and those nodes are validated against that shape (which may, of course, be a reference to a named shape in shapes. Additionally, validate each node/shape pair in the ShapeMap.

Action might then be:

mf:action [
  sht:schema <../schemas/0.shex> ;
  sht:data <empty.ttl> ;
  sht:focus <http://a.example/n2> ;
  sht:shapeMap
    [sht:focus <http://a.example/n1>; sht:shape <http://a.example/S1> ],
    [sht:focus <http://a.example/n3>; sht:shape <http://a.example/S2> ]
] ;

(where sht:focus may be multi-valued in mf:action).

labra commented 7 years ago

I also like Option 2 and Gregg's suggestion that if there is some mechanism to signal focus nodes outside of the shape map, then there must be a start shape and those focus nodes must conform with it.

gkellogg commented 7 years ago

This is resolved, and defined in the WebIDL API section.

shexSpec / shex

Use of schema start vs shapeMap #42