w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
153 stars 47 forks source link

profileDesc and the Guidance document #242

Open kcoyle opened 6 years ago

kcoyle commented 6 years ago

There are two possible approaches we can take to integrating profileDesc into our second deliverable. First would be a small-ish modification to the outline proposed by Lars. A W3C-structured document would exist as a separate item, not on recommendation track. (I'm not sure what options there are for a non-recommendation ontology, but I believe they do exist.)

In the other option, profileDesc is the main normative content of the Guidance document. This could be seen to fit the definition of the Guidance deliverable, which is: A definition of what is meant by an application profile and an explanation of one or more methods for publishing and sharing them. One could read this as meeting this outline:

agreiner commented 6 years ago

So the only thing required for a profile would be a URI?

andrea-perego commented 6 years ago

@agreiner said:

So the only thing required for a profile would be a URI?

I tend to agree that the URI is only mandatory requirement for a profile.

For the rest (profile metadata and (in)formal definitions of a profile), these are very much related to the specific use scenarios. The SHOULD can be considered as "recommended", but possibly also as "conditional" (i.e., "mandatory" if the requirements of a use scenario need it). E.g., in a scenario where profiles need to be published and made discoverable (in a specific registry, catalogue or Web site), a subset of profile metadata may be considered as mandatory. Likewise, profile definitions/representations may be mandatory when a use scenario require, e.g., support for validation.

In any case, it would be important to explain in the guidance document "why" we may need profile metadata and profile definitions/representations.

kcoyle commented 6 years ago

The URI identifies the profile document and doesn't say anything about the content of the profile. I think that the guidance deliverable should cover both the document addressability and the function of the profile. I don't know if we'll use MUST/MAY/etc in the guidance document, or if it will make better sense to talk in terms of common aspects and functionality, as Andrea says above. I would consider the inclusion of the metadata terms to be used to be essential to a profile - I can't imagine what a profile could be without that.

aisaac commented 6 years ago

I fully agree with what's above, if my reading of it is correct. A profile guidance document should seek to make mostly recommendations at this stage, and metadata on profile describe in something like ProfileDesc would be a key part of it. Which is a point for having ProfileDesc folded in the Profile Guidance document one way or another.

aisaac commented 6 years ago

@kcoyle is this issue trying to formally https://docs.google.com/document/d/13hV2tJ6Kg2Hfe7e1BowY5QfCIweH9GxSCFQV1aWtOPg/edit?disco=AAAABxg2QNM ? If yes then we could include the discussion there, in this issue.

Anyway for now my take is that it's fine having both the Profile Guidance and the Profile Description folded in one same deliverable. Considering the discussion above, they probably share the same fate. I see some points about maintenance issue, which may make me change my mind later. But for the moment I see that even if they are separate, any change to the Profile Description would require changing the Profile Guidance (as anyway the latter is going to have to include extensive examples of usage of the former). So the value of separating them may not be very high.

dr-shorthair commented 6 years ago

The scope of profiledesc is not limited to DCAT so there a slight risk that packaging it into DCAT guidance will taint it. But let's look at it down the track to determine if re-factoring is warranted and how easy that would be.

rob-metalinkage commented 6 years ago

+1 a profile just needs a URI

it only needs to be dereferenceable if there is a need to understand it, or some aspects of it, for example to perform validation, form generation or conformance with more general profiles. (If we write a piece of software which hard-codes all the assumptions and only needs to recognise the profile, then it isnt necessary to dereference, and the software writer can possibly use a text description). I dont think we should care about this trivial case in the guidance, but it does apply to negotiation.

The key thing to remember is that profiledesc is explicitly motivated as a means to meet the requirements of profile guidance that cannot be easily satisfied by any other identified vocabulary.

So - there are a few choices 1) dont attempt to satisfy these requirements 2) publish profiledesc as a Rec in the "cleanest" form (standalone) if the W3C process allows this 3) publish profiledesc as a Note and point to it from the GuidanceDoc with a SHOULD and a clause that a Rec that superseded this SHOULD be used if available. 4) treat profiledesc as a normative part of the Guidance Doc 5) align ProfileDesc as a module of DCAT defining a subclass of dcat:Resource under the Rec process

Option 2) is the best given the scope, but revisiting 5 may be the easiest and most appropriate from a process perspective. The cases I have seen where we can implement profiledesc will be able implement DCAT profiles anyway, so we end up with profiles as intrinsically catalogued resources, which is not incompatible with the need to have stable URIs for these things.

kcoyle commented 6 years ago

@dr-shorthair The Guidance document is not DCAT Guidance, it is explicitly guidance for application profiles in general.

kcoyle commented 6 years ago

@rob-metalinkage You say: "The key thing to remember is that profiledesc is explicitly motivated as a means to meet the requirements of profile guidance that cannot be easily satisfied by any other identified vocabulary." We still have not clearly identified those requirements, and we need to do so in order to justify the creation of profileDesc under our charter. We must discuss the draft requirements we have and determine whether they are in scope and where they are best satisfied.

kcoyle commented 6 years ago

@rob-metalinkage The list of choices, 1-5 above, is important. Here are some comments:

  1. dont attempt to satisfy these requirements This doesn't necessarily mean abandoning the profileDesc work; it could be that we recommend that it become part of a general "profiles" effort by W3C. That would separate it from DCAT and make it more visible all around. A community group would be where to place this work.

  2. publish profiledesc as a Rec in the "cleanest" form (standalone) if the W3C process allows this I don't think "rec" will be available to us; still asking about that. I believe there is a way to do this by forming a community group, creating a note, which then is promoted to "rec track", although that would most likely take place after this group finishes its work, so it may not be possible to point to it from our deliverables.

  3. publish profiledesc as a Note and point to it from the GuidanceDoc with a SHOULD and a clause that a Rec that superseded this SHOULD be used if available. This was one of the options that Peter and I discussed with Phil. The profileDesc note would, however, need to have a supporting community group.

  4. treat profiledesc as a normative part of the Guidance Doc This is awkward because the guidance doc is going to be "technology neutral" - that is, it isn't going to assume any particular technology for profiles, and it is not going to be tied directly to DCAT. So having an RDF ontology as part of that doc ... doesn't really fit. (nb: This isn't a criticism of profileDesc; I have some concerns about the coherency of the charter itself.)

  5. align ProfileDesc as a module of DCAT defining a subclass of dcat:Resource under the Rec process Although the two are technically compatible this might make profileDesc less discoverable by users of non-DCAT profiles. So to my mind this "fits" but might be too narrow a context.

aisaac commented 6 years ago

Among the options proposed (thanks @rob-metalinkage !) I'm still in favour of 4. It's easier to handle for us right now. And we can still decide to separate later, depending on progress and adoption of ProfileDesc.

Re. @kcoyle 's point on 4 about ProfileDesc not being 'technology neutral', this is indeed something to consider. But in fact I'm going to seize the opportunity to make a point I wanted to make: I think we should treat ProfileDesc as a Data Model, perhaps one back by RDF model, but not a 'pure' RDF ontology file. That wouldn't prevent us to create an RDF ontology, and have this be the main representation. But we should be completely ready to have Description of Profiles in XML, non-LD JSON, etc.

kcoyle commented 6 years ago

@aisaac I like your suggestion that we treat it as a data model, without specifying a particular technology. That could mean that we define concepts in the text, and the RDF ontology could be a note showing one implementation (I think that implementations can be a note; need to check). The bulk, then, of what we provide would be to show the motivation for needing a profile description (DCAT-AP and Europeana are great examples). Then we have to decide between "should" and "may" but the fact is that we have few instances of profiles that use a description to date, so "may" might be appropriate for now because we are introducing something new. If it is taken up as an actual W3C effort then it could rise to "should" level.

kcoyle commented 6 years ago

Peter suggests Core Public Service Vocabulary AP ( https://ec.europa.eu/isa2/solutions/core-public-service-vocabulary-application-profile-cpsv-ap_en ). Thanks.

larsgsvensson commented 6 years ago

I also agree with @aisaac that we should be technology-neutral. And I also would point out that (to me) the URI identifies the profile as a resource that can have many representations. With that terminology there is no profile document since any document will be (only) a representation of the profile resource. (I. e. let's use the web architecture terminology)

agreiner commented 6 years ago

So here is where I'm confused: if we write a spec that says all you need to have a profile is a URI, then absolutely everything on the web is potentially a profile. If we think a machine-readable form of a profile is a necessity, and the only guarantee we make for a programmer is that the thing has a URI, we haven't really offered the programmer anything useful to code against. I'm still not sure that a machine-readable profile needs to be anything other than a schema, though. That's what I'm hoping some real use cases can clarify. What do we want profiles to do that a schema doesn't already do?

kcoyle commented 6 years ago

@agreiner I agree that saying that it only needs a URI is not sufficient for the resource to be a profile. It's like saying that it only needs an http URL to be a web page, but never defining HTML so that web pages can be created.

Early on in this group there was what felt to me to be a consensus that defining a standard schema for a profile was beyond our abilities or time frame. However, the guidance document should provide a strong definition of what makes a "schema" or a "document" a profile. It's the content, just like the content of a PDF can be a curriculum vitae, a doctoral dissertation, a technical report, etc. We have recognizable document types based on their content. To me, a profile has a certain content and particular purposes. That is what we need to cover in the guidance.

As for having a schema, Dublin Core has the beginnings of this, and I have started work in the Dublin Core area which will hopefully become a work item for that group. It includes a vocabulary for defining simple (i.e. core) profiles.

kcoyle commented 6 years ago

If we take the view proposed by @aisaac and @larsgsvensson of providing a model rather than an ontology, it seems that the outline would look like:

(Note: I'm not sure about saying in the profile guidance that a profile SHOULD have a description because that is something new that we will be recommending but none of our existing profiles have them. I would see the profile description as being RECOMMENDED but applying mainly to future development.)

nicholascar commented 6 years ago

We could emulate the PROV methods by having a Profiles/Profiling model (PROV-DM: conceptual model rather than data mode perhaps?) and then normative implementations in different technologies such as profileDesc as an ontology (like PROV-O) that adhere to the conceptual model and others, if people are interested in making them.

aisaac commented 6 years ago

I think I like @nicholascar's proposal, though the PROV DM is perhaps too complex for my taste - and probably beyond what this WG can do. But certainly we should seek to identify the primitives of a model for profiles.

To answer @agreiner ’s point, I think we should add to the last post by @kcoyle an item that says: A profile MUST be based on some existing data standard(s).

kcoyle commented 6 years ago

@aisaac " A profile MUST be based on some existing data standard(s)." What I worry about is that we'll be asked to define "existing data standard" and some people will want "It's gotta be W3C or ISO..." and others will be "My institution decided this, so it's a standard." The Dublin Core approach is that a profile must use terms defined somewhere, without saying where, how, or what is considered a valid term to use. Can we be that vague?

aisaac commented 6 years ago

@kcoyle I don't know. I thought our definition was using "standard". Anyway, whatever we name it, a standard MUST be based on something :-)

andrea-perego commented 6 years ago

@larsgsvensson said:

I also agree with @aisaac that we should be technology-neutral. And I also would point out that (to me) the URI identifies the profile as a resource that can have many representations. With that terminology there is no profile document since any document will be (only) a representation of the profile resource. (I. e. let's use the web architecture terminology)

Well, we have the example of DCAT (and ADMS), where we have a resource (dcat:Dataset / adms:Asset) and its representations (via dcat:Distribution / adms:AssetDistribution). Analogously, the profile document will correspond to profile metadata (title, description, publisher) and whereas profile representations will correspond to the human- and/or machine-readable definitions of the profile (the list of re-used classes and properties, and their constraints).

andrea-perego commented 6 years ago

@aisaac said:

@kcoyle I don't know. I thought our definition was using "standard". Anyway, whatever we name it, a standard MUST be based on something :-)

What about using "specification" instead of "standard"?

aisaac commented 6 years ago

@kcoyle @andrea-perego next time I'm lazy just correct me without further ado ;-) Indeed our agreed definition (https://www.w3.org/2017/dxwg/wiki/ProfileContext#Draft_common_definition_for_.22profile.22) says "specification" so there's no discussion to be had!

@andrea-perego I believe that your analogy is analogous to the one that lead me to my earlier diagram (https://docs.google.com/drawings/d/1dHkpwKwUwMgS1RqSCTPO3uOoRiY_qNk0z5bhXJlYi4Y/) so I agree ;-)

kcoyle commented 6 years ago

I've thought more about this, and here is an outline that is based on some metadata profiles that I am familiar with. Although none of our use cases specifies the need for administrative data, we can probably assume that it is, if not required, at least a good idea.

This outline separates into two separate sections the expression of the profile itself, and information about the profile (meta information). What goes where is up for discussion, but this seems to me to break the profile description into sections that would make sense to the reader: what is a profile? how do I make it available? (publishing), how is it administered?, what needs to be said about its context? All of this could be managed in a single profile document, or there could be separate documents that are linked. If done as a single document then a profile is self-describing, which may have some advantages for sharing across platforms (e.g. for profiles that are not in RDF or that cannot link to external documents reliably).

Introduction (non-normative) Definitions (normative)

Profile publication (normative)

Examples (non-normative)

Administrative and descriptive metadata (normative)

Example(s) (non-normative)

Bibliography

rob-metalinkage commented 6 years ago

I do have a question (or suggestion) about the metadata side of things...

a lot of this is standard "cataloguing" - and would be covered by the DCAT move to an abstract dcat:Resource

so wondering if we shouldnt push this into a more general statement about using available descriptive metadata the bits that are not already covered by DCAT are of course " * Relationship to vocabs or other profiles

This tends to support the alignment of profileDesc with DCAT - which could be pointed as a recommended way to use DCAT to meet these goals.

kcoyle commented 6 years ago

@rob-metalinkage In general I agree that much of the admin stuff is standard admin metadata. I'm not saying that profile work should invent a new ontology for it, just that a profile should have this information about itself. Since we aren't developing a machine-readable profile description, we don't need to specify how this is to be coded.

It isn't clear to me yet if we'll be recommending any ontologies to fulfill these functions. That's one of the big decisions that needs to be made as the guidance document develops. Given our time frame, however, I suspect that the document will remain at the level of a conceptual model, and may be a support document for future W3C work (which I think is needed).

aisaac commented 6 years ago

Thanks @kcoyle I like this new outline. There are a lot of similarities with what we had agreed on earlier, and the differences provide some added value, clearly.

The only downside I see is with the definitions. I think we can't avoid puting our definition for profile upfront. On the other hand, we could put the other definitions at the end of the document, and have hyperlinks to them in earlier parts of the text. This way they wouldn't stand in the flow.

aisaac commented 6 years ago

From @kcoyle :

Antoine, this can all get re-arranged as needed, so feel free to make suggestions or provide a new version. I didn't think extremely hard about order.