w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
154 stars 47 forks source link

Definition of "Schema" (as opposed to profile) #195

Open larsgsvensson opened 6 years ago

larsgsvensson commented 6 years ago

While we have agreed on a definition of "Profile" we still don't have a working definition (nor an agreed-on term for the implementation of a profile using a spedific technology, e. g. ShEx or XML Schema). I propose the use of "schema" for the profile implementation and the following definition: "A schema is an implementation of a profile using a specific technology. Some schema languages are tied to specific media types, e. g. JSON Schema can only be used to describe the structure of JSON documents (application/json) and XML Schema only to describe XML documents (application/xml) whereas other schema languages are tied to specific technologies, e. g. SHACL and ShEx can be used to describe the structure of an RDF document whether the RDF document is expressed in Turtle (text/turtle), JSON-LD (application/ld+json) or RDF/XML (application/rdf+xml)." Comments and other suggestions are most welcome.

akuckartz commented 6 years ago

A minor comment: the list of RDF serialisations at the end is not complete. They are only examples.

kcoyle commented 6 years ago

Could the difference be that a schema is actionable as code, but a profile may be merely a document? (This then begs the question of broken schemas - in code but that don't work - but I think that's true for all schemas, such as bad XSD.)

nicholascar commented 6 years ago

The Profiles Description work is designed to address these issues comprehensively, see the profiledesc folder.

Referring to the [ontology diagram]https://github.com/w3c/dxwg/blob/gh-pages/profiledesc/profiledesc.png, we say that a Profile is a profile of something else and that an Implementation Resource Descriptor describes the way in which the profiling is done. The ImplResourceDesc class can do this with a resource format, a resource role and a resource type. These may be indicate the implementing resource (the thing doing the work of profiling) might be "actionable as code" like SHACL or "merely a document" like a PDF.

In the examples given so far, we have cases of a Profile that has both an "actionable as code" implementation and a "merely document" implementation, see how the existing DCAT-AP profiling of DCAT is documented with both SHACL and document implementations!

dr-shorthair commented 6 years ago

Is it possible that resourceType is equivalent to (or perhaps a sub-property of) dct:conformsTo ?

(@rob-metalinkage and I had an opportunity to look at this together last week and this is one of the simplifications/clarifications/potential alignments that we spotted).

dr-shorthair commented 6 years ago

NB - to read the documentation for the profile ontology see https://w3c.github.io/dxwg/profiledesc/profiledesc.html

Thanks @rob-metalinkage and @nicholascar

makxdekkers commented 6 years ago

Before deciding whether, and if so, how, prof:resourceType is related to dct:conformsTo, I suggest we first decide on the semantic definition of prof:resourceType. Is it a (kind of) "established standard to which the described resource conforms", and how does its proposed range skos:Concept relate to the range of dct:conformsTo, which is dct:Standard?

fellahst commented 6 years ago

I like @larsgsvensson proposal to introduce the term Schema to be used as an implementation of an (application) profile. I would suggest we replace the ugly name ImplResourceDesc with Schema and having a relation between Profile and Schema called hasSchema or schema. A Schema can be seen as a subclass of Dataset with one or more Distributions.

kcoyle commented 6 years ago

I understand the ambiguity of "schema" but agree that Implementation Resource Descriptor is hard to say and also hard to ingest semantically.

What I think we are missing here is the placement of the profile description within the catalog and in relation to DCAT. Does someone have a macro diagram that puts it all together?

fellahst commented 6 years ago

During the OGC Testbed 12, we worked on a Semantic Mediation Service. We introduced the concept of Schema and SchemaMapping. I think this is relevant for this discussion. Here a link to the model: http://docs.opengeospatial.org/per/16-059.html#_srim_schema_application_profile

rob-metalinkage commented 6 years ago

Sounds to me as if we are happy with the nature of the beast - not the name - so the ugly name has served its short term goal and may now be honourably discharged.

"Implementation Resource Descriptor" is exactly but awkwardly semantically correct - the object is a descriptor of a resource that defines some aspect(s) of implementation of a profile.

If we are happy "schema" includes "guidance notes" then i guess it works - although it feels a bit weird. And note the range is not a schema, its a descriptor that qualifies a reference to the actual schema. prof:resource might be better or even prof:constraints - as the resource must specify constraints.

"aspect" may be better.

aisaac commented 6 years ago

"Schema" sounds really bad/dangerous to me, to the point that I would prefer "Implementation Resource Descriptor". The existence of RDF Schema alone is going to make any story around a 'profile schema' extremely hard to tell.

Couldn't we simply use something like "profile implementation" or "machine-processable profile implementation" for the time being? (btw using "resource" and "descriptor" is probably going to be tough in the context called the "Resource Description" Framework. Like, everything around us is resources and descriptions ;-)

aisaac commented 6 years ago

On 17/04/18 17:15, Peter.Winstanley@gov.scot wrote:

Antoine Can we get any mileage out of other similar terms such as 'pattern' or 'design' or even 'archetype'/'prototype'?

I'm not sure: these sounds like quite abstract resources. I guess that with 'schema' @larsgsvensson wanted to reflect that the thing is processable. Actually maybe 'Processable Profile Implementation' would be suitable?

larsgsvensson commented 6 years ago

ProcessableProfileImplementation of course reflects nicely what it is. That said, it's a bit verbose and I don't know if IANA would accept an Accept-ProcessableProfileImplementation-header... As @aisaac said, schema is overloaded but so are most short catchy terms. My original suggestion for schema came indeec from XML Schema and the fact that someone (cannot remember who, but it was at the TPAC in Sapporo) told me that if RDF Schema hadn't been taken already, "RDF Shapes" most likely would have been name "RDF Schema". Hmmm, perhaps shape could be an alternative...

fellahst commented 6 years ago

There are a number of schema languages out there that are used for enforcing data to comply with an (application) profile: SQL Schema, XML schema, JSON Schema, Schematron, SHACL, SHex, Relax, RDF Schema, OWL. They all define an encoding that can be understood by tools to perform structural, syntactical or logical validation. Some schema languages are complementary, such as XML schema and Schematron, or OWL and SHACL. Some are more loosely defined such as RDF schema, others are more expressive such as Schematron and SHACL. IMHO, not all applications require having complete compliance with all the rules associated with a profile. In this context, associating the term "Schema" with Profile makes sense. Schema provides a way to encode the rules that the profile needs to adhere to.

makxdekkers commented 6 years ago

Just my two cents: Why not call what is now called Profile -- the conceptual definition -- ProfileSpec and what is now being called awkwardly ProcessableProfileImplementation simply Profile? To be honest, I am usually much less interested in what the URI or the label is and much more interested what the definition is. If I understand correctly, a URI like prof:ProcessableProfileImplementation is an attempt to put semantics into the URI which I think is not always a good idea. And in any case, a non-English speaker would probably need to look up the definition anyway.

aisaac commented 6 years ago

@larsgsvensson I had missed that this was for the IANA header. I was focusing on the ontolgy defined by @rob-metalinkage and @nicholascar .

@fellahst the problem is that as @larsgsvensson said it, half of the languages that you cite (RDFS, OWL, ShEx, SHACL) are not recognized as 'schema' languages the same way as the other half is.

aisaac commented 6 years ago

@makxdekkers this is seducing. I don't know if this is going to play well wrt headers, though. We need both, don't we, @larsgsvensson ? And if we come with two labels that are not easy to distinguish we're going to have a hard time telling the story.

And yes I guess this comes from the expectations that URI/labels should reflect semantics. And IANA has it too, apparently!

rob-metalinkage commented 6 years ago

and just to re-assert - in the current real world such resources may also be non-machine readable documents as either formal specifications or guidance notes. we could define sub-properties for different resource types - but also mustn't confuse type with roles (e.g a non-normative partial check of conformance to a spec, expressed in SHACL - i think thats the DCAT-AP case )

rob-metalinkage commented 6 years ago

IMHO "shape" is actually pretty close to the concept of profile (not the processable resource describing it) - i.e. this is the contents of the graph i expect to see. a shape can be expressed in SHACL or SHEX.

its really a subclass of profile however - some profiles may constrain the content not the shape.

From my reading its a naming discussion we are having, with a bit of premature narrowing ("processable") creeping in.

aisaac commented 6 years ago

@rob-metalinkage I don't understand your "in the current real world such resources may also be non-machine readable documents as either formal specifications or guidance notes.". @larsgsvensson started with asking about "an agreed-on term for the implementation of a profile using a spedific technology, e. g. ShEx or XML Schema", which does sound like a machine-readable document.

rob-metalinkage commented 6 years ago

@aisaac evidence: DCAT-AP is a document. Most OGC profiles are documents - but some profiles also have XSD and schematron resources. Many specifications have separate guidance notes and worked examples. Many have separate conformance test suites.

We are describing how these relate as well as setting up for increasingly machine processable specifications. My implementation Use Case at OGC cares as much about the legacy as the future state.

aisaac commented 6 years ago

@rob-metalinkage I'm very much agreeing that some specifications won't be machine-readable. But @larsgsvensson's starting point for this issue was about implementation in a specific technology, and refering (only) to machine-processable examples, hence I'm assuming that he had something specific in mind. And @kcoyle refered to "a schema is actionable as code". But I'm going to stop speaking in their place, maybe I'm over-interpreting.

rob-metalinkage commented 6 years ago

Also w.r.t to @larsgsvensson initial comment, this must be taken in context of the agreed definition of a profile - which is agnostic about processability

aisaac commented 6 years ago

@rob-metalinkage sure but Lars was asking about something more specific than 'profile' in general. Or are you hinting that we shouldn't seek to extensively discuss anything about processability at this stage? Like, as I'm not convinced we should discuss something like a 'formalism' at this stage, see https://github.com/w3c/dxwg/issues/194?

pwin commented 6 years ago

do terms such as 'template' or 'form' fit here?

larsgsvensson commented 6 years ago

@nicholascar wrote:

In the examples given so far, we have cases of a Profile that has both an "actionable as code" implementation and a "merely document" implementation, see how the existing DCAT-AP profiling of DCAT is documented with both SHACL and document implementations!

Yes, I now start to grasp that. My view so far had been that there is a profile URI that (per content negotiation) resolves to human-readable and machine-understandable profile descriptions/definitions. From there (i. e. the profile URI) there are links to other documentation and to machine-processable versions of the profile (e. g. XML Schema, ShEx etc.).

Those two approaches are obviously something we need to discuss as a group so I'm happy to see that there is plenty of time scheduled at the Genoa meeting.

larsgsvensson commented 6 years ago

@aisaac wrote:

@larsgsvensson I had missed that this was for the IANA header. I was focusing on the ontolgy defined by @rob-metalinkage and @nicholascar .

Well, it's not only for the IANA header, but also generally about how we name things (and as @makxdekkers said) what the nature of those things is.

larsgsvensson commented 6 years ago

@aisaac wrote:

@makxdekkers this is seducing. I don't know if this is going to play well wrt headers, though. We need both, don't we, @larsgsvensson ? And if we come with two labels that are not easy to distinguish we're going to have a hard time telling the story.

The name we eventually use for the ImplementationResourceDescriptor doesn't have to be the same that we use for the header, particularly given that the scope of the content negotiation is the profile and not the profile implementation (so my former comment about the IANA header wasn't a very clever one...). That said, it of course simplifies the marketing if we can use the same name everywhere.

nicholascar commented 6 years ago

If you really must have a shorter word for Implementations Resource Descriptor, how about Definer? A Definer defines a Profile with a format, it’s own conformance to a Profile and a role (dct:format, dct:conformsTo and prof:resourceRole respectively).

larsgsvensson commented 6 years ago

@rob-metalinkage wrote

Also w.r.t to @larsgsvensson initial comment, this must be taken in context of the agreed definition of a profile - which is agnostic about processability

Yes, I agree there: A Profile is agnostic about processability. But as @aisaac said, I'm looking for a term explicitly for the processable implementation of a profile, the one we can use for validation, UI building or for creating code (similar to Java XMLBeans).

larsgsvensson commented 6 years ago

If you really must have a shorter word for Implementations Resource Descriptor, how about Definer? A Definer defines a Profile with a format, it’s own conformance to a Profile and a role (dct:format, dct:conformsTo and prof:resourceRole respectively).

Yes, could be a solution. And the more I look at it, I think that we first should figure out what the model looks like and then try to figure out names for the things in it (as @makxdekkers suggested).

larsgsvensson commented 6 years ago

Having looked at Rob's und Nicholas's model again, I now see that what I call "Profile" is a conflation of what they call "Profile" and "ImplementationResourceDescriptor". What I call "Schema" is also an "ImplementationResourceDescriptor" and that's probably one reason for my confusion. The idea of the "ImplementationResourceDescriptor" is of course to have a node where we can add metainformation (e. g. that what we describe is a SHACL document serialised as Turtle) and that's completely lacking in my naive "Profile". If we have "ImplementationResourceDescriptor"s, then what they describe perhaps shouldn't be called "ImplementationResourceDescriptor", too, but "ImplementationResource"s (since that is what an ImplementationResourceDescriptor describes), and the property linking the "ImplementationResourceDescriptor" to the thing it describes could be called "prof:describes" instead of "prof:resource" (so that the rdfs:range does not make it an "ImplementationResourceDescriptor". Then the model would be

:aProfile a prof:Profile ;
    prof:hasResource :aResourceDescriptor .
:aResourceDescriptor a prof:ImplementationResourceDescriptor ;
    prof:describes :aImplementationResource ; # a shacl file serialised as turtle
    prof:resourceType ex:shacl ;
    dct:format "text/turtle" .

I think we've got something...

Thanks,

Lars

rob-metalinkage commented 6 years ago

The "model it first then optimise the names" seems to be gradually bearing fruit. Are we sure prof:describes isnt logically backward though - the implementation resource describes (in a possible processable way) some aspects of the profile. In the simple case we may have a normative canonical machine-readable defining the profile, but that is handled by the (currently mandatory) prof:resourceRole - but we must also allow for cases like DCAT-AP where SHACL constraints are experimental or informative. prof:describedBy works for me however.

larsgsvensson commented 6 years ago

Are we sure prof:describes isnt logically backward though - the implementation resource describes (in a possible processable way) some aspects of the profile.

Obviously we aren't ;-) I thought the ImplementationResourceDescription describes the implementation resource, not the Profile. If it were to describe the Profile, I would have called it ProfileDescription. Perhaps "describesImplementationResource" but that's feels very verbose.

rob-metalinkage commented 6 years ago

OK - I had jumped up a level - you are right the descriptor indeed "describes"

so it would be: :aResourceDescriptor a prof:ImplementationResourceDescriptor ; prof:describes :aImplementationResource ; # a shacl file serialised as turtle dct:conformsTo ex:shacl ; prof:resourceRole prof:ConformanceTest ; dct:format "text/turtle" .

note resourceType -> dct:conformsTo as it references the relevant standard, which may be a profile itself.

larsgsvensson commented 6 years ago

deliberately introducing into the discussion the idea of a canonical role defined in the prof:namespace

Yes, that's definitely worth exploring. My first thought was to define a controlled vocabulary but then it's not extensible for other domains... What kinds of roles do you envision (and what happens if an implementation resource can fulfill more than one role?

rob-metalinkage commented 6 years ago

I too had started with the idea of a controlled vocabulary - but it is perhaps easier to be extensible and have the core concepts defined in the one place.

I dont see any reason why a class hierarchy cannot also be a skos:Concept hierarchy - classes are just concepts with some additional model defined.

more than one role is easy I guess - just have more than one value for the predicate.

arminhaller commented 6 years ago

Since there are different levels of schema languages that can be used by a profile as @larsgsvensson mentioned, do we need to consider a different analogy (and abstraction level), i.e. the model-view-controller pattern. Isn't the thing we define as a profile, the "view" in the MVC pattern. For Web developers (AngularJS, Django, Rails) this terminology would make a lot of sense.

rob-metalinkage commented 6 years ago

One more wrinkle - if we flatten out the hierarchy for a convenience view, we may need to distinguish between a resource which is inherited and one which is part of the profile itself - i.e. is an inherited set of constraints which may be overridden performing a different role?

aisaac commented 6 years ago

(disclaimer: this is as much a proposal as an attempt to ask, whether I got the discussion right, by trying to relate it to things I know)

How about trying to adapt the Linked Data content negotiation (and Http-range-14) pattern here? I.e. there's a URI for a thing in the real world ("non-information resource"), which redirects to information resources in different formats, which are expected to be ‘descriptions’ of the original resource. The idea is that these different information resources can be consumed by different applications, and thus have different function.

Adapting to our profile context:

I believe we could say that the IR1, 2, 3 all relate to the profile resource via the ‘prof:describes’. We could type them with a class like prof:ProfileDescription. If we need it, we could subclass this class with prof:HumanReadableDescription (for IR1) and prof:MachineReadableDescription (for IR2 and IR3).

What I'm less sure about is whether these information resources should all be semantically equivalent in the pattern. But I think it can be acceptable that resources are not semantically equivalent, when they try to describe the profile using formalisms (XML, SHACL) that do not have the same expressiveness.

andrea-perego commented 6 years ago

I would support @aisaac's proposal for class names, as it is simpler and more intuitive. I must also say that, after looking closely at the profiledesc vocabulary and the examples included in the different discussion threads, I keep on seeing strong analogies with DCAT (in particular, dcat:Dataset and dcat:Distribution). Based on that, I wonder whether we can simply call the two main classes prof:Profile and prof:ProfileDistribution.

About the distinction between human- and machine-readable profile definitions, I wonder whether there's really the need to make it explicit (and define specific subclasses), or it can rather be inferred by the format or the "standard" (XML Schema, SHACL, etc.) the profile definition conforms to.

Finally, about semantic equivalence of profile definitions: I totally agree this is not a requirement.

kcoyle commented 6 years ago

I'm not sure what problem we are attempting to solve here. For content negotiation, we need

  1. an identifier that allows access to the resource
  2. media type information that informs the ingesting software what it needs to know in order to read/parse the file that it receives
  3. a profile name that can be offered or accepted

About what technology solution the contents of the resource conform to, as Andrea says "it can rather be inferred by the format or the "standard" (XML Schema, SHACL, etc.) the profile definition conforms to". This will be coded in the received resource and therefore does not need to be provided in the negotiation itself. Also note that there is not a bright line between human and machine-readable - cf. XML -> XSLT for the exposition of human-facing documentation carried in the XML file.

The new requirement ( point 3 above) as requested in ID5.2 is for "A profile captures additional structural and/or semantic constraints in addition to the media type." What I don't see in the use cases is how such a profile will be addressed in the http header, other than by name. From ID5.2:

"Clients and servers should be able to indicate their compatibility and/or preference for certain profiles. This enables clients to request a resource in a specific profile, in addition to the specific content type it requests. A client should be able to determine which profiles a server supports, and with which content types. A client should be able to look up more information about a certain profile."

The IETF proposal doesn't appear to resolve this either.

Key here is that we are not at any point defining a machine language for profiles, and none yet exists, so it is not really possible to reference such a language or the lack thereof (e.g. "human-readable"). This also makes it difficult to refer to functions like modularity (ID5.3), inheritance, etc. because those are inherently technology-dependent. All that appears to be on the table is a way to negotiate with the indication of additional information about the resource, that is, to be able to give a name to a profile that is recognized by client and server. So far no mechanism for making those names themselves known (like MIME types are known) has not been defined, and it seems to me that this is an important element that makes thinking about the profile description quite difficult.

larsgsvensson commented 5 years ago

As the one who created this issue almost a year ago, I have the impression that we have moved far away from my original question that was "how do we define 'schema' as opposed to 'profile'" where a "schema" is an implementation of a profile using a specific formalisation (e. g. ShEx or XML Schema). In my opinion this is being taken care of by the profiles ontology, particularly through the classes Profile (="profile") and ResourceDescriptor (that might be an implementation of a profile in a specific formalisation and thus can be a "schema"). Thus I'd say that this issue is resolved and can be closed.

aisaac commented 5 years ago

I agree we have made progress, but would be hesitant to close the issue until appropriate wording is added in the documentation. Especially, I'd see that some of @larsgsvensson 's wording in the last comment would be quite nice in sections 3 and 4 of the profile guidance doc: https://w3c.github.io/dxwg/profiles/

kcoyle commented 5 years ago

@aisaac : the text by @larsgsvensson refers specifically to PROF classes, which will probably not be referenced in those sections of the profgui document. Let's hang on to this, but we'll need to re-word it for the guidance document. I'll put a place-holder in the section 3 Google Doc because I don't know if we'll remember this.

aisaac commented 5 years ago

@kcoyle ok I'll do the extraction myself, and now :-) This is what I'm interested in:

a "schema" is an implementation of a profile using a specific formalisation (e. g. ShEx or XML Schema) a [profile expression/descriptor/component whatever we'll use in the profile guidance doc] is an an implementation of a profile in a specific formalisation and thus can be a "schema"

Thanks for having put a link to this in section 3. Actually on second thought it may better fit with the explanation about how a profile is published (section 4) rather than the roles of profiles (section 3). But anyway having the reminder in section 3 will be perfect enough for the moment!

larsgsvensson commented 5 years ago

Interesting discussion (again), The point I unsuccessfully tried to make was that I don't think we need the term "schema" any more.

larsgsvensson commented 5 years ago

Resolved for conneg: https://www.w3.org/2019/03/13-dxwgcneg-minutes.html#x06