w3c / dx-prof

The Profiles Vocabulary
https://w3c.github.io/dx-prof/prof/
Other
6 stars 2 forks source link

PROF roles and their definitions #23

Open kcoyle opened 5 years ago

kcoyle commented 5 years ago

A small set of roles has been included in the PROF document as a starting set. That set is:

There are questions about the definitions in a Google doc that are still not addressed and may need additional discussion. Also, we need a general consensus on whether these are the roles that the group feels are the basic ones to be included.

nicholascar commented 5 years ago

@kcoyle could you please bring the points of concern out of the Google Doc you link to into comments in this Issue so we can address all points in one place? Perhaps an individual comment per role definition would be useful to enable us to reference them via GitHub comment URIs. Thanks.

kcoyle commented 5 years ago

You want me to repeat all of the comments in the Google Doc? If I do, the best thing would be to create an issue per role, because github issues can't be threaded, so commenting on individual roles or definitions would be hard. Let's start with decided WHICH roles, then we can move on to the definitions, which will probably require new issues. I'll start the WHICH roles.

kcoyle commented 5 years ago

It is somewhat difficult to discuss the roles without reference to their meaning, but here's my basic feeling about basic roles:

  1. Specification is not a role - everything listed here is a specification based on our definition of specification
  2. mapping is a relationship between two data definitions, and there is nothing in prof that allows one to say "a mapped to b", so this isn't covered by prof as defined
  3. most of the terms here are ambiguous in that they could be either human-readable documents or actionable code. Leaving in that ambiguity, to me, makes the roles less than useful. As an example, a human-readable document that includes a vocabulary has a very different use case to an actionable schema. If the role doesn't specify the difference between readable documents and actionable code, then I don't think the roles are terribly useful. You could say that a .ttl or .pdf makes it clear, but I think we can all imagine a .ttl that doesn't do what you expect it to do. Plus, using the format requires additional digging - why not put it directly in the role?

We should discuss which roles we think cover the most common cases. I'll take a guess at: (and what we name them should be a 2nd step)

  1. Human-readable profile vocabulary documentation (may include description of required/desired schema and record formats)
  2. Human-readable business rules (guidance)
  3. Machine-actionable validation rules (it may be useful to call out XSD, SHACL, ShEx, Schematron... others commonly in use?)
  4. Vocabulary that is encoded in a schema language (XSD, OWL, YAML, JSON-S, ????)
  5. Diagrams of the data model (I'm only so-so on this one)
  6. Example sets

A caution about coding the roles: If any of these are interpreted to mean "any document or file that has these characteristics" the resulting metadata may not be very useful. You could have any full or partial examples in every one of the roles I list above, but coding all of them as "examples" just means that a person has to looked at every document. Some rules about what merits a role, even within a single community, could be very important to "save the time of the user". I think that well-crafted definitions could help with that.

rob-metalinkage commented 5 years ago

I suggest looking at some concrete examples that matter to you and trying to describe the roles they have. Resources can have multiple roles - and be both human readable or machine readable if they are identified by URIs that support content negotiation - so you would need to declare what each conformsTo (its nature) and its available formats - trying to shoehorn that sort of information into role naming conventions then asking clients to interpret would be counterproductive. (I'm not arguing for the existing roles - but we need good evidence of something that cannot be properly described using them, including using multiple roles if needed.

looking at the list above they are mainly mixed concerns better described formally with conformsTo and hasFormat. The one that sticks out as possibly having a unique and important semantics is "vocabulary" - what sort of a role is "vocabulary" - can we identify a concrete case in the wilde of a resource that performs this role?

kcoyle commented 5 years ago

Actually, I included some examples in my list, but here are some more:

  1. Human-readable profile vocabulary documentation (may include description of required/desired schema and record formats) - example: DCAT-AP.pdf, DCAT-AP-SE.pdf, GeoDCAT-AP-IT.pdf, etc.
  2. Human-readable business rules (guidance) - example: DCAT-AP.pdf, et al
  3. Machine-actionable validation rules (it may be useful to call out XSD, SHACL, ShEx, Schematron... others commonly in use?) Examples: Europeana Schematron, BIBFRAME/SHACL, DCAT-AP SHACL
  4. Vocabulary that is encoded in a schema language (XSD, OWL, YAML, JSON-S, ????) Example: DCAT-AP .ttl, BIBFRAME .ttl
  5. Diagrams of the data model (I'm only so-so on this one) example: BIBFRAME diagram, Figure 1 - DCAT Application Profile UML Class Diagram. (The reason I'm only so-so on this is that the diagrams are rarely stand-alone, usually within a human-readable document, and may not be worth calling out on their own)
  6. Example sets example: DCAT sets in dxwg github
rob-metalinkage commented 5 years ago

need to revisit the thinking here - DCAT-AP is a "logical profile" not a document/resource performing some role of expressing it.

For each example I suggest you clearly articulate the profile and the resource - so that the role of the association between the two can be made clear.

kcoyle commented 5 years ago

This is essentially your example number 5, where the DCAT-AP document has role:Guidance. Are you objecting that I didn't say "DCAT-AP PDF or Word document"? Honestly, really? OK, so just add the word "document" after each example above.

https://joinup.ec.europa.eu/rdf_entity/http_e_f_fdata_ceuropa_ceu_fw21_f6f27f059_bf785_b4d7d_bb602_b6448aab73bd5 a prof:ResourceDescriptor; rdfs:label "DCAT-AP Guidance Document (Word)" ; dct:format https://w3id.org/mediatype/application/msword ; prof:hasRole role:Guidance ;

rob-metalinkage commented 5 years ago

@kcoyle - actually it is necessary to be explicit and correct in this context.

kcoyle commented 5 years ago

@rob-metalinkage I fixed the only ambiguous statements - the ones like "Europeana schematron", while not including a link, surely you can figure out. For any where I refer to a document, I added the document file extension.

rob-metalinkage commented 5 years ago

relating to some specific points above

1) "specification" - may be problematic and not sufficiently useful - happy to drop and reintroduce an equivalent if we have an argued case with an example.

2) "mapping" was introduced as a result of feedback w3c/dxwg#847 and although its not teased out in a UC its something I am finding is needed when describing activities , for example in the Citizen Science area - where one of the most valuable pieces of information is how a project schema maps to various alternative data standards - this relationship often carries most of the documentation about the semantic constraints on a given element.

3) "most of the terms here are ambiguous in that they could be either human-readable documents or actionable code. " is by design - it doesnt make them ambiguous w.r.t. role at all - and other metadata indicates the format (and information profile if desired) which determines if a resource is actionable by a specific client.

Your list is a good list of examples of resources - we actually need to tease out the general cases they illustrate and define role identifiers and descriptions that match these. There is a straw man for these already based on a combination of UC and feedback. This is the starting point.

The process now must be to raise a separate issue for each case where one of the following cases applies:

a) a resource fits the role but the name or description can be improved to make it clearer b) a resource almost fits a role and it should be generalised to allow it to match better c) a resource can be shown not to fit any role and is an important enough case we could all agree its worth extending d) it fits OK, but it is felt worthwhile using as an example to help people see how these roles work for familiar objects

happy to leave this issue open as a reminder we need to have a statute of limitations for provision of new suggestions.

kcoyle commented 5 years ago

We can have a desired goal for finalizing this, but cannot consider it resolved until consensus is reached.

@rob-metalinkage Are you intending to create the separate issues? I've done this one, but resolving this is essentially a task for the editors. (And I don't intend to become an editor on this deliverable ;-)). Thanks.

rob-metalinkage commented 5 years ago

This was discussed in the prof meeting and all agree its an area where we know we only have first cut, but the separation of roles (which may have many variations and nuances are business for communities of practice) however, in accordance with @aisaac suggestion if we have a set of useful ones we have consensus about we can bring them into the core, leaving "at risk" ones in an extension vocabulary. This is the main issue open that looks like it may result in some refactoring of and/or changes to the normative model - and hence its a priority to discuss in meetings and achieve consensus. The Google doc is just background we can go back to - but any substantive issue raised in it and not addressed needs to be brought into the most appropriate, resolvable, issue.

Within that strategy, separate issues could be created to isolate specific proposals or illustrate evidence where a known example does not fit well with the options proposed.

In this issue we can address general questions - and so I will lead off with a separate comment..

rob-metalinkage commented 5 years ago

As we have seen it is difficult to resolve the nuances and difficulty in modelling roles and mapping them to all the different languages different communities use for similar things.

I think we need to focus on the fact that resources may play more than one role (this is why properties defining role relationships are not a good option IMHO, in addition to the difficulty of nailing them all down in advance which properties need more than role identifiers)

so

0) understanding that in the model roles are "qualifiers" on a relationship, seek to improve name/and/or/definition to make this clearer..

and

1) bring a small number of less controversial roles into a normative documents 2) push roles where we have insufficient evidence or time to reach consensus about definition - but are needed for implementations in particular - to a extensible roles vocabulary that can be managed by a community group in practice 3) determine a strategy that roles should be simple - addressing a single aspect of the relationship, and additive to create richer semantics. 4) better describe how certain semantics are already handled by other more specific properties - such as the media-type (its SHACL or SHEx) and profiles (what data model the resource dct:conformsTo) 5) use words with existing meanings to define aspects of roles: 6) make sure we have examples for the complex cases where one resource contains multiple logical objects and performs multiple roles (such as a document that contain mandatory testable things, recommendation, guidance etc)

1) "normative" and "informative" (disjoint semantics - but may be used in combination if the resource constains both) 2) "shall" "should" "may" (to distinguish levels of conformance - a single resource may combine these - but would need to distinguish internally somehow) 3) "guidance" - the Profiles guidance document would have this role relationship with the PROF vocab! 4) "example"

This doesnt tell us how to handle the "partOf" cases (including the "single combined validation view") - three options: 1) propose and gain consensus and formalise names and definitions for these qualifiers (perhaps adopting something like https://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/) 2) flag these at "at risk" in the roles vocabulary as additional qualifiers 3) define a "closed world" semantics where there is a requirement that the set of "normative" resources define the the full set of constraints.

At any rate if needed I would recommend reading https://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/ to see the same sort of patterns we are facing here and the computer-sciency terms used for them.

kcoyle commented 5 years ago

also w3c/dxwg#638 has meaty comments on this

kcoyle commented 3 years ago

The comments made in the Google Doc have not been taken into account. Will they be addressed?

rob-metalinkage commented 3 years ago

Roles definitions have been split out from prof, as the focus was getting the PROF data model finalised. It has proven useful and stable so far under implementation, so it is definitely time to revisit the roles vocabulary and we will indeed need to revisit these comments. Google docs should not be part of the documentation trail here however, fine for an ephemeral discussion but outcomes - either as resolutions or open questions should be handled by issues. Such issues need to be flagged to roles only, so they don't interfere with visibility of PROF issues.

rob-metalinkage commented 3 years ago

Have re-reviewed this as it was a long time ago - most comments have already been covered by PROF design that separates role semantics from the form of the resource performing the role.

A couple of improvements in wording are indicated. We should however do a review of wording of all of these in the light of proposed new roles to make sure they are semantically disjoint and as clear as we can be.