w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
152 stars 47 forks source link

Profile Composition and Languages #162

Open VladimirAlexiev opened 6 years ago

VladimirAlexiev commented 6 years ago

Submitting a new USE CASE, strongly related to ID3


Profile Composition and Languages

Status:

Identifier: ID52 (proposed)

Creator: Vladimir Alexiev, Ontotext

Deliverable(s): AP Guidelines, Content Negotiation

Tags

profile_negotiation profile representation composition

Stakeholders

data consumer, data producer

Problem statement

As stated by ID3, the response of a server can conform to multiple profiles. However, I believe it is unclear how these profiles should compose. Different languages for defining profiles have different "composition" capabilities. Possible profile languages and composition mechanisms should be described.

Examples:

Existing approaches

The Expected RFC states one such mechanism for XML schemas:

the elements in namespace urn:example:namespaces:ns1 must conform to XML schema http://example.com/schema/schema-1 and the elements in namespace urn:example:namespaces:ns2 must conform to XML schema http://example.com/schema/schema-2

  • It is unclear how applicable/useful is this separation by namespace. Eg NIEM includles a lot more advanced mechanisms for composing XML schemas.
  • It is unclear whether such separation by namespace is applicable/useful for RDF.

Requirements

Related use cases

ID3, RPFDF, RPFN

jpullmann commented 6 years ago

Vladimir, you already provided some examples of "profile definition languages" which formalization level is decisive about operations like composition (+ conflict resolution), referencing etc. There are UCs supporting the usage of profiles, mainly ID2, ID30, ID41, and ID46, while no one deals specifically with the formalization of profiles. I'd suggest to coordinate with @RubenVerborgh, @larsgsvensson, and @rob-metalinkage on creation of a new, dedicated use case on "Profile specification(s)".

ID3 implies the usage/application of multiple profiles at once, where profiles are implicitly merged/combined, which your UC seems to address. Here, based on "Profiles specification(s)", the issues of conflict resolution etc. come into play.

VladimirAlexiev commented 6 years ago

@jpullmann thanks for the good overview! Awaiting a reaction from @RubenVerborgh, @larsgsvensson, and @rob-metalinkage.

I think these are complex issues: how to compose profiles, described with different technologies, that can be used to validate data, or to understand what data to expect from a adataset. IT would be great if the DXWG can resolve. But I'm a bit afraid that maybe they go beyond the mission of DCAT, which maybe is just to describe, not to make technological advances in validation.

rob-metalinkage commented 6 years ago

Hi,

I have already created a "straw man" for a lightweight description of profiles independent of the profile specification language - potentially as part of dcat - since this is a concern for describing data sets and extends dct:conformsTo and dc:Standard just enough to declare relationships and bind to profile description resources. This is intended to support existing profile hierarchies (described in documents) as well as profile descriptions in OWL, SHACL or other choices.

its at https://github.com/w3c/dxwg/tree/roba-profile/dcat/rdf

Its agnostic about the composition method, but the definition of profile adoption does require constraints to be transitive - a profile cannot relax or change sense of an inherited constraint.

I think its a good point to decide if thus is in scope and implied by current requirements, or if we want a more explicit UC and requirements.

I dont believe we are in a position to require or develop a specific profile description vocabulary - but i would expect that DCAT profile guidance would be able to recommend use of W3C vocabularies such as SHACL and RDF-Datacube

dr-shorthair commented 6 years ago

@rob-metalinkage Could you make some documentation of the strawman so it doesn't depend on us inferencing (mind-reading) from peering into the the RDF? Maybe in the Wiki here https://github.com/w3c/dxwg/wiki/DCAT-Profiles-Topics

larsgsvensson commented 6 years ago

@VladimirAlexiev Thanks for your feedback. Here a late reaction from me...

The Expected RFC states one such mechanism for XML schemas:

the elements in namespace urn:example:namespaces:ns1 must conform to XML schema http://example.com/schema/schema-1 and the elements in namespace urn:example:namespaces:ns2 must conform to XML schema http://example.com/schema/schema-2 It is unclear how applicable/useful is this separation by namespace. Eg NIEM includles a lot more advanced mechanisms for composing XML schemas.

It is unclear whether such separation by namespace is applicable/useful for RDF.

This draft is definitely not state-of-the-art any more. It's sort of mixes profiles and schemas and doesn't really cater for the media type independence we're aiming at when discussing profiles. You mention SHACL and JSON-LD Frames as possibilities for profile languages (what about JSON Schema?). In my terminology SHACL, ShEx, JSON Schema and others of that ilk are not profile languages but schema languages since they are geared at describing and validating data described using a specific technology (RDF, JSON etc.) but cannot be used to validate the same data if it's published in another format. One example from our place: As a national library we serve other libraries around the world and they expect the data to be available in MARC. Our data is availalble in two serialisations: MARC 21 (based on ISO 2709) and MARC-XML. The way MARC-XML is built, the semantics are exactly the same as in ISO MARC and you can convert losslessly between the two formats, but there it ends. Still we want to be able to describe how the content is organised, which fields we use etc. independently of whether it's published in ISO MARC or MARC XML. So that would be a task for the profile. We then can create an XML schema to describe our version of MARC XML and whatever-you-can-use-to-validate-ISO-2709 to describe the ISO MARC version. TL;DR Profiles != Schemas Profile : media type independent Schema : technology dependent (since you can describe all RDF serialisations with SHACL independent of media type...)

larsgsvensson commented 6 years ago

@VladimirAlexiev

I think these are complex issues: how to compose profiles, described with different technologies, that can be used to validate data, or to understand what data to expect from a adataset. IT would be great if the DXWG can resolve. But I'm a bit afraid that maybe they go beyond the mission of DCAT, which maybe is just to describe, not to make technological advances in validation.

Well the WG isn't only about DCAT but we do have profiles on our agenda, too. Glad to hear that you think of that as a technological advancement!

I don't have an answer on how to do profile composition (yet) but I hope we'll find a solution. What we need is a (technology-independent?) meta-model of profiles including composition mechanisms. And there needs to be a standard way navigate from a profile to the ones it's composed of and a standard way to navigate from it to (technology-dependent) schemas implementing it.

larsgsvensson commented 6 years ago

@rob-metalinkage

I have already created a "straw man" for a lightweight description of profiles independent of the profile specification language - potentially as part of dcat - since this is a concern for describing data sets and extends dct:conformsTo and dc:Standard just enough to declare relationships and bind to profile description resources. This is intended to support existing profile hierarchies (described in documents) as well as profile descriptions in OWL, SHACL or other choices.

I followed @dr-shorthair 's suggestion and pasted one of your examples into the Wiki with some comments from me for further discussion

I dont believe we are in a position to require or develop a specific profile description vocabulary - but i would expect that DCAT profile guidance would be able to recommend use of W3C vocabularies such as SHACL and RDF-Datacube

I think we can, perhaps not as a recommendation, but at least as a WG note

VladimirAlexiev commented 6 years ago

what about JSON Schema?

Sure.

they are geared at describing and validating data described using a specific technology

But are there successful languages for universal data description? (The book Validating RDF Data has some great examples related to HL7that are rendered in XML schema, SHEX and I think another formalism)

independently of whether it's published in ISO MARC or MARC XML. So that would be a task for the profile

Ah but this is a very narrow example. MARC XML uses the same MARC lingo (tags, subtags etc) and is a mirror image.

We then can create an XML schema to describe our version

I doubt you can do it with XML Schema, I think you have to also use Schematron for the cross-field rules.

I can give another narrow example: Implementing complex e-Government solutions with open source and BPM: Architecture of Export Control System phase 2 (ECS2). Alexiev, V.; Mitev, A.; and Bukev, A. Java2Days Conference, Sofia, Bulgaria. 2010. http://vladimiralexiev.github.io/pubs/AlexievMitevBukev2010-eGovBPM.pdf See slide 18 and 33-40. We used an XPath cross-field validation language (similar to Schematron but simpler) that was used for XML, Java beans, and the interactive UI.

need a (technology-independent?) meta-model of profiles including composition mechanisms

I think we do: even if we can't develop a technology-independent language for defining (implementing) profiles, we can develop one for composing profiles.

larsgsvensson commented 6 years ago

they are geared at describing and validating data described using a specific technology

But are there successful languages for universal data description?

I don't know. But a profile description language should be as media-type independent as possible.

(The book Validating RDF Data has some great examples related to HL7that are rendered in XML schema, SHEX and I think another formalism)

I'll have a look at those

We then can create an XML schema to describe our version

I doubt you can do it with XML Schema, I think you have to also use Schematron for the cross-field rules.

OK, might be and that was not the point I was aiming at. The point is that there are several flavours of MARC 21 (not counting all flavours of MARC). In his MARC validator, Péter Király mentions six different ones that all ought to be proper MARC 21 but where there is a choice of where you can put the information an some suppliers do it like this and some like that. To me that's different profiles of MARC 21.

need a (technology-independent?) meta-model of profiles including composition mechanisms

I think we do: even if we can't develop a technology-independent language for defining (implementing) profiles, we can develop one for composing profiles.

OK, then let's do it!

azaroth42 commented 6 years ago

I agree with Vlad here. There seems to be confusion about whether the profile is a set of validation rules, or if it's something else; for example

Profiles may be written in or may link to a document or schema in a validation language (ShEx, SHACL, XMLschema). [ID41] (5.41)

I think this issue is clear that the Profile and schema/validation language are separate resources, and thus profiles can only "link to" those machine processable constraint descriptions. That a profile could then link to many such descriptions, for different formats (JSON-schema, SHACL/ShEx, xml schema / schematron, etc.), thereby resolving Lars' definition that the profile is technology independent.

azaroth42 commented 6 years ago

Housekeeping: Can someone add content_negotiation tag to the issue please

kcoyle commented 6 years ago

Hmmm. We seem to have "profile-negotiation" as a tag but not "content-negotiation". I added "content-negotiation" but I'm not sure it's a good idea (we've suffered label proliferation in the recent past). I'll leave it there and see what folks think. Meanwhile, I added the "profile-negotiation" label.

azaroth42 commented 6 years ago

Sorry, I meant profile_negotiation :) I was confused by Vlad's tag in the issue text.

rob-metalinkage commented 6 years ago

@azaroth42 "I think this issue is clear that the Profile and schema/validation language are separate resources, and thus profiles can only "link to" those machine processable constraint descriptions. That a profile could then link to many such descriptions, for different formats (JSON-schema, SHACL/ShEx, xml schema / schematron, etc.), thereby resolving Lars' definition that the profile is technology independent."

@larsgsvensson " I think we do: even if we can't develop a technology-independent language for defining (implementing) profiles, we can develop one for composing profiles.

OK, then let's do it!"

this is exactly the motivation for the ProfileDesc vocabulary :-) Please identify where it succeeds or fails to meet these goals and also make sure the Use Cases and Requirements adequately drive this if there is any doubt.

kcoyle commented 6 years ago

@azaroth42 As to whether profiles and schema languages are necessarily separate "things" - there are folks who wish to use SHACL or ShEx as profile languages, or at least as the basis for a profile. I haven't seen an example so I don't know what that looks like in comparison to, say, DCAT-AP. Presumably you could add properties for instructions, examples, definitions, etc. to a validation language and that may be sufficient in some cases. I have doubts about the human-friendliness of that solution, but for sure we should allow for profiles as human-readable documents and coded validation as separate, with all of the downsides of expressing some of the same things in two different places.

kcoyle commented 6 years ago

@rob-metalinkage It is incumbent on you to show how/if profileDesc meets these requirements if you believe that it does. Note that as of yet there is no documentation that gives the goals of PD, the scope, definitions of terms, etc., and no documentation that links it to specific requirements. We have talked about this before. Also, all work must follow the W3C process, which moves from use cases to requirements to problem statements and then solutions, all done through open meetings of group members, agendas, minutes, assigned actions and consensus on solutions. We cannot work backward from solutions that have been developed outside of this process. The sooner we confirm the use cases and requirements the sooner we can begin to work on solutions as a group, as required by W3C procedure.

If you are aware of missing use cases, please suggest them. Also, as we complete the discussion of requirements a gathering of requirements that are needed to "describe" profiles (perhaps as a github issue for discussion) would be useful. That could help us scope that particular function.

dr-shorthair commented 6 years ago

@kcoyle wrote:

Also, all work must follow the W3C process, which moves from use cases to requirements to problem statements and then solutions, all done through open meetings of group members, agendas, minutes, assigned actions and consensus on solutions.

Certainly this is a common and good practice, but I disagree that strictly "We cannot work backward from solutions that have been developed outside of this process." Sometimes prototype solutions are the most efficient way to document requirements.

Looking at the W3C process document [1] I see no strict rule about how a WG operates. For example, there is no mention of use-cases. The DWXG charter [2] doesn't seem to specify a strict process either. Perhaps there is another document you are leaning on? I certainly respect a process that starts with use cases and derives requirements before engineering a solution, but the waterfall method is not the only way to get results, and has its own well known risks.

My understanding of the W3C process is that it is more permissive and flexible than this. Yes, all significant decisions should be documented. And Use Cases and derived Requirements can be important reference points, and are a common artefact in W3C work. But I don't even think that a formal UCR is strictly mandatory? As I understand it, each working-group is at liberty to decide on its own internal mode of operation. Other groups that I was involved in used the UCR to trigger the work, and to provide a check list for the outputs, but the outputs also included material that couldn't be strictly traced back to a specific use case, and during the process were also willing to consider ready-made solutions outside the UCR process.

Perhaps you are suggesting that the DXWG has agreed to follow a strict process? Can you point to where this is recorded? Otherwise, I would suggest that there is no impediment to at least considering proposals that arise during the development of our work, even if they came from some parallel or external stream.

[1] https://www.w3.org/Consortium/Process/ [2] https://www.w3.org/2017/dxwg/charter

kcoyle commented 6 years ago

Simon, a number of things.

1) In terms of considering profileDesc, we aren't there yet. We are still developing our requirements around profiles and conneg and do not have a clear idea of what they will be. That makes it hard to evaluate a solution, which may explain the next point:

2) The profileDesc ontology and diagrams has been available to the group for months and anyone who wishes to look at it is free to do so. The complaint is that no one has engaged with it. That is definitely an option that group members have - to not engage with it until/if it becomes a critical path work item that fulfills needs they have. The proposal has been offered, it has been discussed in messages and at F2F3, and there has not been uptake by the group as a whole. The horse has been led to water ... I have now suggested more than once that documentation would help others understand better what the proposal is. Clearly, what's there isn't grabbing people's attention in the way that the authors desire.

3) There are alternate views that have been expressed for description of profiles, and when we get to that point, we will discuss them as well.

4) W3C process requires transparency to remain a neutral ground for standards development. You yourself have provided a truly exemplary sub-group process transparency in the DCAT group, and we worked to make sure that agendas and minutes would be available on the wiki. I believe we have consensus on that procedure. There are updates being made to profileDesc in github without any visible discussion. That means there isn't much for folks to engage with. Transparency helps give people something to think about; involvement of the group is what brings buy-in.

5) Chairs also must make sure that we stay within our charter and as you know we are trying to locate the description of profiles within our three deliverables, as based on the requirements.

6) If you don't like the process or the decisions of the chairs (and Peter and I have both supported the process we are using, which was begun when Caroline was with us) you can take that to W3C management. They may have helpful advice for us, and I would welcome that.

akuckartz commented 6 years ago

Also, all work must follow the W3C process, which moves from use cases to requirements to problem statements and then solutions ...

I doubt that such a process is required by the W3C. It might be required by a WG, but then this requirement should be documented. Otherwise the process is intransparent.

kcoyle commented 6 years ago

Please see the charter[1] with our timeline (3.3) that includes a deadline for the use cases and requirements. Although the creation of a formal UCR document is optional, the use cases and requirements are not. The purpose of the UCR document is mainly to solicit wide agreement on the scope of the work.

Also note that this group has existed for over a year and we have been openly pursuing these goals all of that time. The work itself is documented in detail in our agendas and minutes, the UCR document was published as a FPWD (which makes it official in W3C terms). It seems somewhat odd to be questioning the process now. However, if there are concrete suggestions for change we can run them past the W3C representatives we have and also past the group as a whole.

That said, there is nothing keeping the group from considering the profileDesc other than its own will. Using the process we have been engaged in is a suggestion to increase the chances that the group will consider that proposal because it puts it in the context of our work.

[1] https://www.w3.org/2017/dxwg/charter

nicholascar commented 6 years ago

@kcoyle I'm keen to improve the documentation around profileDesc, including further detailing Use Cases for many reasons, one of which is to increase this group's understanding of it and thus the groups engagement with it.

However, can we do a status check before I do that please? This is a long-running thread (first posts back in March) and much has been done in the profiling space in this group since.

Can I please check these assumptions:

We have a growing list of profiling requirements confirmed as in scope and use cases, like #239 that describe the motivation for particular approaches. We also have well discussed issues such as profileDesc and the Guidance document that indicate great engagement with profileDesc in particular and in profile guidance and implementations of profileneg in general.

So what then is missing? Do we perhaps need to better indicate which parts of the things we are working on (guidance doc, specific implementations) have Use Cases and Requirements clearly detailed?

I gather you think there is a Use Case or two for profileDesc missing? We have 239 for the Alternates View approach but do you think we are missing such for profileDesc?

VladimirAlexiev commented 6 years ago
rob-metalinkage commented 6 years ago

https://github.com/w3c/dxwg/tree/gh-pages/profiledesc

"full" and "simplified" seem to be more distinct frame based profiles of a common profile that determines meaning (yet another example of an hierarchy)

rob-metalinkage commented 6 years ago

I also think the level of fields is more granular than we need to worry about - thats more of an API concern. Describing the availability (and usage) of sets of fields is more relevant to make statements about interoperability of data (which is why its more a DCAT issue than a conneg issue, in spite of the word profile appearing...)

kcoyle commented 6 years ago

@nicholascar I think your questions are questions for the group. We can either add this to an upcoming agenda, or perhaps a separate github issue with an email asking folks to weigh in.

I don't know if there are missing use cases - and we haven't finished going through the profile/conneg requirements. Ideally those of you who know profileDesc would do that analysis based on the requirements listed as of today[1].

[1] https://docs.google.com/document/d/13hV2tJ6Kg2Hfe7e1BowY5QfCIweH9GxSCFQV1aWtOPg/edit

azaroth42 commented 6 years ago

Rob A:

"full" and "simplified" seem to be more distinct frame based profiles of a common profile that determines meaning (yet another example of an hierarchy)

Yes, a semantic/data model profile and a materialization profile. I would go one level deeper to say there's a conceptual profile behind the data model, which is then mapped to various ontological models. And perhaps one level further out as well for strict API concerns.

Conceptual: The human understanding of the world, without any RDF notions at all Semantic: The mapping of the human understanding to a set of RDF terms Materialization: The set of terms from the semantic profile that appear in a particular materialization of it, regardless of format, including the potential for simplifications within the same semantic set of constructs. API: The serialization rules to apply to get to a document that can be carried over a protocol between server and client.

eg:

Then there are the ontologies that the Semantic Profile also uses... (a) uses SKOS and SKOSXL, (b) uses CIDOC-CRM, RDF, RDFS, and linked.art. I think in schemaDesc these are BaseSpecifications, as distinct from Profiles (but the name could be improved, IMO)

rob-metalinkage commented 6 years ago

@azaroth42 - your exercise in separation of concerns is probably also applicable to datasets and distributions. i dont think there is much appetite for a very deep model of profiles however - and everything needs to be grounded in a specific requirement.

That said, the "role" qualifier on a profile definition (in word, SHACL, UML or whatever) could be used to discriminate between these levels of abstraction perhaps. PDF (text) forms of profile definitions tend to bundle multiple levels of abstraction in a single artefact, but machine readable artefacts such as the "semantic profile" could have a specific role.

Materialisations could be sub-profiles - and the use of production rules, for example to control JSON-LD serialisation etc could be included as a requirement - such profiles could inherit from an additional base specification which includes such constraints. (a profile is an interoperability contract - so predictability of serialisation is definitely a profile concern).

So currently, I think the profileDesc proposal can handle these concerns - and it is derived from the original proposed requirements in the UCR, (the current exercise in discussing these in plenary has not yet had significant impact by adding, deleting or changing any of these). So procedure-wise we can a) test profileDesc against your own scenario - and suggest improvements. If your scenario is not adequately addressed however... b) look for holes in the set of Use Cases such that they do not sufficiently drive or explain such requirements. to my mind, its possibly the specification of "API profile" interoperability rules applicable to dataset Distributions and services. You may need to look at DCAT practices here if this cannot be addressed as part of a profile specification.

NB. An open question for profileDesc is to what extent we define a canonical set of "roles" to describe the form and function of constraint expressions.

azaroth42 commented 6 years ago

Yes, agreed. I think that between the hierarchical approach and the use of roles, it will work nicely to allow various permutations of the above, or completely different thinking :)

kcoyle commented 6 years ago

@nicholascar I was hoping to find an example of what I mean by basic documentation, but I didn't find something analogous in a very short search. Just looking at a bunch of readme's kind of sets the stage. What I imagine is really quite simple, which is adding some paragraphs to the readme or a short introduction of profileDesc that says:

I'm thinking this ends up being a screen or screen and half of stuff. Kind of an "elevator talk" level that promotes the idea. Then we'll ask folks to read it and they may have questions or suggestions.

BTW if this already exists in another file then I missed it while poking around and I apologize, so let's make it more visible.

nicholascar commented 5 years ago

Removing tag profile-negotiation since this issue is all about profile-description

andrea-perego commented 3 years ago

Unlinking this issue from UCR, as decided in https://www.w3.org/2021/04/20-dxwg-minutes#r03

aisaac commented 3 years ago

@nicholascar @rob-metalinkage should this be linked to PROF (and moved to its github repo?)