Profile negotiation [RPFN]

jpullmann commented 6 years ago

Profile negotiation [RPFN]

Create a way to negotiate choice of profile between clients and servers

akuckartz commented 6 years ago

:+1:

nicholascar commented 6 years ago

I have put up a proposal at https://github.com/w3c/dxwg/tree/profiledesc-working/profiledesc/profileneg

kcoyle commented 6 years ago

Nick, profile negotiation is its own deliverable, as per the charter, and is so far based on a proposal by Lars and Ruben: https://profilenegotiation.github.io/I-D-Accept--Schema/I-D-accept-schema. It would be best not to start a separate effort, but to further what is already proposed. Also note that any "solutions" must be based on use cases and requirements. As I have mentioned before, we appear to be lacking use cases that would lead to the profileDesc work and this profile negotiation proposal.

nicholascar commented 6 years ago

I think that the work I’ve outlined above is compatible with Lars’ & Ruben’s work.

In the implementions we’ve used before, a _format Query Sting Argument is used instead of it as a override for Accept header and _view QSA is effectively the equivalent of Accept-Profile.

I would be able to implement Profile headers in the 6 or so APIs delivering different profiles in operation now if I can get persistent URIs for the profiles.

We have discussed the registration of Profiles within our Govt Linked Data WG as registration would give them a persistent URI. We will likely register a series of Profiles for purposes such as an energy sector profile of DCAT (2018) but currently we are unclear about whether a catalogue of known profiles is needed or even possible. We may make such a thing for Aust Gov-approved profiles.

larsgsvensson commented 6 years ago

I think we should be careful to try to standardise a way of putting profile information into URIs/URLs by mandating the use of _format or _view. I agree that it's one way of doing it, but there are others as well. The URLs to the specific resource versions can be propagated using http Link-headers or html link elements (and of course as normal <a href=... in the html pages). A registry for profiles sounds good. There could even be several, community-specific registries.

nicholascar commented 6 years ago

I agree that URI QSAs are only one of many ways of doing it and perhaps even a secondary way with HTTP headers being the primary, however I think such easy human use ways are very useful, hence my Use Case https://github.com/w3c/dxwg/issues/239

Since we are providing profile guidance, not just a single standard, I think we can base URI methods on (to be compatible with) HTTP methods.

larsgsvensson commented 6 years ago

I don't disagree that we need easy ways for humans to address profiled versions of documents. What I disagree with is to say that we should mandate the use of _format or _view. There are other ways we can do that in the URL, e.g. by using a syntax à la http://example.org/entity.profile.filetype (e. g. http://example.org/myCatalogue.dcat-ap-de.ttl identifying the turtle serialisation of a dcat-catalogue using the DCAT-AP.de profile) instead of using http://example.org/myCatalogue?_view=dcat-ap-de&_format=turtle

RubenVerborgh commented 6 years ago

Let's not break the Web; no spec should mandate the URL structure of a server.

A secondary way can just be to follow links, i.e., opening the main profile URI in the browser results in an HTML document with links to other representations (for which the server can determine the URIs of its own).

larsgsvensson commented 6 years ago

+1 to @RubenVerborgh

agreiner commented 6 years ago

I understand that the motivation for profile negotiation is that some users want to be able to use the same URL for versions of a dataset that conform to different profiles. Can anyone explain why some users want that? It does work to model the versions as representations of a resource, but it would also work to model them as separate resources (as in Ruben's suggestion above).

RubenVerborgh commented 6 years ago

I understand that the motivation for profile negotiation is that some users want to be able to use the same URL for versions of a dataset that conform to different profiles.

No, the motivation is to have the same resource available in different profiles. And resources on the Web happen to be identified by URLs.

Note that each representation still can have its own URL. We will just provide the mechanism to get from resource to representation.

Can anyone explain why some users want that?

to get from a resource to its representations
to see what other representations a resource has

It does work to model the versions as representations of a resource, but it would also work to model them as separate resources (as in Ruben's suggestion above).

Both models are the exact same, really.

To understand this, it's important to see that the "representation" concept is a relative notion. E.g., in the sentence "A is a representation of B", B the resource that A is the representation of. However, A is a resource in its own right.

An example to clarify:

http://example.org/weather/amsterdam/2018-06-01 is the weather report for Amsterdam for 1 June
http://example.org/weather/amsterdam/2018-06-01.html is the weather report for Amsterdam for 1 June in HTML

Regardless of whether 2 has its own URL, all of the following hold:

1 is a resource
2 is a resource
2 is a representation of 1

agreiner commented 6 years ago

I'm talking about the motivation to use negotiation. If the only motivation is to have the same resource available in conformance to different profiles, I don't see any particular reason to have profile negotiation that works like content negotiation. Having multiple profiles available is realized already by just offering a version of the dataset that conforms to one profile under one URL and a version that applies to another under another URL. Sorry I can't recall where it was expressed, but the idea of one URL for multiple profiles came from someone else in the group (maybe Lars?).

akuckartz commented 6 years ago

@agreiner

Create a way to negotiate choice of profile between clients and servers https://www.w3.org/TR/dcat-ucr/#RPFN

RubenVerborgh commented 6 years ago

I'm talking about the motivation to use negotiation.

Negotiation is what gets clients to the representation with their preferred profile.

If the only motivation is to have the same resource available in conformance to different profiles

No, that's not the motivation. We can do that with existing technologies already.

What existing technologies don't do, is automatically getting a resource represented in a profile the client understands.

I don't see any particular reason to have profile negotiation that works like content negotiation.

It's just like negotiating between XML or JSON, except more fine-grained: https://ruben.verborgh.org/articles/fine-grained-content-negotiation/

Having multiple profiles available is realized already by just offering a version of the dataset that conforms to one profile under one URL and a version that applies to another under another URL.

But how does the client get from one to the other? Our answer: content negotiation.

kcoyle commented 6 years ago

Can we use DCAT as an example? I'm going to toss one out but it may not be correct. What if you have a dataset that has a whole lot of census-type data, which includes a wide range of elements that can be seen as about people (age, race, employment, location). Not every use of the data wants to make use of all of the columns in the table. Would different profiles be the way to get the view of the data that you desire? If so, could there be a direct correlation between profiles and services? Or could it be that one person's profile is another person's service?

RubenVerborgh commented 6 years ago

Would different profiles be the way to get the view of the data that you desire?

Yes, profiles could be defined for views you want to see.

If so, could there be a direct correlation between profiles and services? Or could it be that one person's profile is another person's service?

Well… services and resources are different abstractions of Web interfaces. The resource-oriented view is that you ask for a specific representation (tied to a profile) of a resource. The service-oriented view is that you send a command to a server that provides you with a representation conforming to a certain profile.

larsgsvensson commented 6 years ago

@agreiner scripsit:

Having multiple profiles available is realized already by just offering a version of the dataset that conforms to one profile under one URL and a version that applies to another under another URL. Sorry I can't recall where it was expressed, but the idea of one URL for multiple profiles came from someone else in the group (maybe Lars?).

Yes, that was me. Our use case is that we have two linked data serviced offering data about the same entities (e. g. persons and geographic entities) but using two different metadata profiles. The first one is our default one served through the domain d-nb.info, e. g. http://d-nb.info/gnd/118601717/about/lds. the other one is a beefed-up version also offering things like links to images at wikimedia commons, e. g. http://hub.culturegraph.org/entityfacts/118601717 that is used to drive the entity pages in the German Digital Library at https://www.deutsche-digitale-bibliothek.de/entity/118601717. The point is that both representations are about the same entity, identified by http://d-nb.info/gnd/118601717 and we want to serve both representations using the same URI. The solution to this would be profile negotiation.

larsgsvensson commented 6 years ago

@agreiner answered on the mailing list: Thanks, Lars,

Can you explain the value that you see in having the same URL for both datasets? What gives me pause here in particular is the mention that one is a beefed-up version with links to images. To me, that suggests that they are really two different resources; one clearly contains more stuff. Would you also make both datasets available under separate URLs for human consumption?

-Annette

larsgsvensson commented 6 years ago

@RubenVerborgh answered on the mailing list: Hi Annette,

Obviously not Lars, but my two cents below :-)

Can you explain the value that you see in having the same URL

So that we can link to the data, regardless of how it is represented.

I.e., for the same reason that we link to http://dbpedia.org/resource/Marie_Curie instead of http://dbpedia.org/data/Marie_Curie.json or http://dbpedia.org/page/Marie_Curie, since the first URL can be used for clients of any kind, whereas the two others are specific to certain types of client.

Furthermore, the first URL remains valid if new representations are added in the future.

for both datasets?

Nitpick: you call them "both datasets", implying that they are different datasets. While we probably shouldn't get too philosophical on what a dataset is and isn't, but Lars described his case as:

data about the same entities (e. g. persons and geographic entities) but using two different metadata profiles.

so the dataset seemed the same.

To me, that suggests that they are really two different resources;

Here I want to point out again that different representations A and B are different resources. However, Lars seems to imply that both A and B are representations of a dataset C.

The resource "the HTML version of X" is a different resource than "the JSON version of X"; however, both are representations of X.

So whether or not they are different resources (they are) does not seem the question here.

Best,

Ruben

larsgsvensson commented 6 years ago

And now I comment myself...

(@agreiner ) Can you explain the value that you see in having the same URL for both datasets?

At this level I'm not concerned with datasets but with arbitrary entities (identified by URIs) that can have 1..n representations (also identified by URIs that in most cases are also URLs). And from my point of view the entities belong to one dcat:Dataset. The representations of those entities (e. g. modelled using profile-1 and serialised as text/turtle, or modelled using profile-2 and serialised as application/ld+json) can then be collected and publshed as dcat:Distributions of the said dcat:Dataset.

(@RubenVerborgh ) I.e., for the same reason that we link to http://dbpedia.org/resource/Marie_Curie instead of http://dbpedia.org/data/Marie_Curie.json or http://dbpedia.org/page/Marie_Curie, since the first URL can be used for clients of any kind, whereas the two others are specific to certain types of client.

Furthermore, the first URL remains valid if new representations are added in the future.

+1

(@RubenVerborgh ) Nitpick: you call them "both datasets", implying that they are different datasets. While we probably shouldn't get too philosophical on what a dataset is and isn't, but Lars described his case as:

data about the same entities (e. g. persons and geographic entities) but using two different metadata profiles.

so the dataset seemed the same.

Yes, from my POV the entities are in the same dataset but the different representations are in different distributions.

(@agreiner) To me, that suggests that they are really two different resources

It's all about the same August Rodin, identified by http://d-nb.info/gnd/118601717. And, as Ruben stated, there are several resources that work as representations of Rodin (or the metadata about him). They are targetted at different audiences and thus have different profiles, but they are still describing (representing) the same entity. So if you wish you can see this as a move towards entity-based identification as opposed to representation-based identification.

nicholascar commented 6 years ago

For the main example in Use Case 239, I referred to the views or profiles of the metadata for a physical sample, AU239 (coincidental numbering). That sample has different metadata for different audiences (legacy XML format, current SOSA RDF etc.) but we certainly want the same URI for the sample. Currently we're using query string args to separate out the profiles but would like to support HTTP profile negotiation for smarter machine clients.

We have to use the same URI for the sample as one of the reasons we have URIs for samples at all is to de-duplicate references to the same sample and to do that we need to know that it's really the same thing which, although possible with multiple URIs, is much easier with a single one.

We're trying to say "metadata for sample AU239 is at URI http://pid.geoscience.gov.au/sample/AU239 regardless of the form of metadata you want".

kcoyle commented 6 years ago

(Finally updated link to my book, also putting it here: http://kcoyle.net/beforeAndAfter/index.html)

It feels to me that in some of the discussion there is confounding of profiles and serializations. That's something we need to be careful about - profiles and serializations are orthogonal.

The example:

for the same reason that we link to http://dbpedia.org/resource/Marie_Curie instead of http://dbpedia.org/data/Marie_Curie.json or http://dbpedia.org/page/Marie_Curie

makes we wonder if we haven't ventured into FRBR Work territory.[1] (I recall some mention of DCAT dataset being at the FRBR work level.) If anyone wants to do that, then the work and the distributions and the profiles all will have URIs, otherwise they have no existence in the web sense. Whether we prefer to use the work URI in a query doesn't mean that the distributions and profiles do not have a URI - if they are on the web, they have a URI. It also seems that they will almost surely have a profile-based web identifier when they are the response to a content negotiation action. (Just as the result of each SPARQL query has a web identifier, albeit temporary in scope.) When one asks for "http://dbpedia.org/resource/Marie_Curie" in json, one presumably gets "http://dbpedia.org/resource/Marie_Curie.json". When one asks for "http://dbpedia.org/resource/Marie_Curie" as defined by ProfileX, then there needs to be a unique identifier for that data in that profile. Is defining this part of the content negotiation deliverable?

[1] The "undifferentiated work URI" is a hairy thing. If it is to be defined as part of content negotiation I urge caution. I need to add that I am one of the staunchest FRBR skeptics, and wrote an entire book on why I feel that way. All I can say is "there be dragons" so think it through very carefully.

dr-shorthair commented 6 years ago

(I recall some mention of DCAT dataset being at the FRBR work level.)

Yes - that is the unwritten premise behind my comments here https://github.com/w3c/dxwg/issues/55#issuecomment-394575989 https://github.com/w3c/dxwg/issues/52#issuecomment-394575481

(@kcoyle The link to the skeptical book is probably not the one you intended - could you fix it so I can look)

RubenVerborgh commented 6 years ago

It feels to me that in some of the discussion there is confounding of profiles and serializations.

How would you define a serialization, and how is it different from a representation?

That's something we need to be careful about - profiles and serializations are orthogonal.

Do you mean profiles and media types? (If so, I agree.)

When one asks for "http://dbpedia.org/resource/Marie_Curie" as defined by ProfileX, then there needs to be a unique identifier for that data in that profile. Is defining this part of the content negotiation deliverable?

It might or might not have its own identifier (it's very useful if it does). But this deliverable should not specify anything about the form of this identifier, apart from the fact that it might exist.

Best,

Ruben

kcoyle commented 6 years ago

Yes, my serialization is your media type.

"It might or might not have its own identifier " - if there is no identifier, how will it be accessed/transmitted?

It's fine to have a "work" identifier (although again I caution that one needs to think very hard about what that identifier identifies), but any resource on the web has an identifier for the resource, not just the work. This is why I recommend that this work vs. actual thing be thought through carefully, and the relationship between those be clear. I don't know DCAT terribly well but this seems to be a difference between dataset and distribution. Obviously, the response to content negotiation is some form of distribution (in DCAT terms). In the FRBR sense, the work is an abstract concept with no physical/digital presence, and it is only when it is manifested (distributed) is there a non-abstract thing. So as long as the URI for the dataset refers to an abstraction, that makes sense, but I'm not clear on what the non-abstraction consists of.

RubenVerborgh commented 6 years ago

Yes, my serialization is your media type.

That might be a bit confusing then, because a serialization (as in "a concrete series of bytes representing a dataset") would be determined by multiple factors, such as media type, language, and profile.

"It might or might not have its own identifier " - if there is no identifier, how will it be accessed/transmitted?

Access through the non-negotiated identifier; indicate your preferences in headers. The server replies with the negotiated response.

but any resource on the web has an identifier for the resource, not just the work.

Any resource on the Web can have an identifier.

I don't know DCAT terribly well but this seems to be a difference between dataset and distribution.

A distribution is a representation of a dataset.

So as long as the URI for the dataset refers to an abstraction, that makes sense, but I'm not clear on what the non-abstraction consists of.

It refers to the dataset.

Ruben

kcoyle commented 6 years ago

I think you misunderstood my question about non-abstractions, so let me make it clearer.

As I understand it: DCAT dataset is an abstraction. It is only the distributions that are "real" - that is, that can be accessed. There is no access to a dataset EXCEPT through a distribution (in DCAT).

Therefore the non-abstraction above refers to a distribution (as defined in DCAT, which has some permanence) or some other "thing" that is returned from content negotiation. What I'm wondering is whether, in DCAT parlance, what is served through conneg is a distribution, or if it is something else, or if this isn't the right question?

Adding (from DCAT): dcat:Catalog represents the catalog dcat:Dataset represents a dataset in a catalog. dcat:Distribution represents an accessible form of a dataset as for example a downloadable file, an RSS feed or a web service that provides the data.

RubenVerborgh commented 6 years ago

Therefore the non-abstraction above refers to a distribution (as defined in DCAT, which has some permanence)

OK.

What I'm wondering is whether, in DCAT parlance, what is served through conneg is a distribution, or if it is something else, or if this isn't the right question?

A distribution.

Ruben

nicholascar commented 6 years ago

@kcoyle & @RubenVerborgh: are you saying above that one can interpret a resource and profiles of it as a Dataset and Distributions of it?

If so, I think this is problematic. I see many more types of Resources and Profiles of them than DCAT will allow for. E.g., a Sample identified by URI with profiles of metadata for different purposes. The Resource + Profiles pattern holds here but not Dataset +Distributions.

I can think of other cases: Datasets are just too “big” a thing for many Resources to be sensibly interpreted as them

RubenVerborgh commented 6 years ago

@kcoyle & @RubenVerborgh: are you saying above that one can interpret a resource and profiles of it as a Dataset and Distributions of it?

I'm saying that a dataset is a resource, and that representations of that dataset conforming to certain profiles and serialized in a certain media type are distributions.

I see many more types of Resources and Profiles of them than DCAT will allow for.

That's fine. The mechanism is more generic than that. It's not because a dataset is a resource, that all resources are dataset.

dr-shorthair commented 6 years ago

The alignment of DCAT and FRBR [1] is incomplete -

frbr:Work is effectively implemented in dcat:Dataset
frbr:Manifestation is implemented in dcat:Distribution
frbr:Expression is not implemented in DCAT - probably because in practice there is no artefact

dct:conformsTo provides a hook to indicate the standard (which can be a schema or profile) that a resource conforms to. But in DCAT that is associated with dcat:Resource|dcat:Dataset and not with dcat:Distribution. How is it typically used?

In order to fully match FRBR we would need a way to indicate different schematic representations of a dataset (i.e. conforming to different profiles), alongside the different serializations (media-types). Maybe add dct:conformsTo to dcat:Distribution where it should be used to indicate the schema/profile/view that this representation takes.

[1] https://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records

dr-shorthair commented 6 years ago

dcat:Distribution represents an accessible form of a dataset as for example a downloadable file, an RSS feed or a web service that provides the data.

@kcoyle - now we have added an explicit class for services (dcat:DataService and sub-classes) dcat:Distribution should not be used for a service. The definitions in the DCAT Editors Draft [1] have been tweaked slightly, but certainly could be further improved.

[1] https://w3c.github.io/dxwg/dcat/

dr-shorthair commented 6 years ago

In an email that has not made into this GitHub thread, @agreiner takes us back to Fielding's analysis of web architecture, which distinguishes only Resource and Representation. The issue with that is that it conflates schematic representation and serialization into the one step.

As I understand it, this requirement (Profile Negotiation) is aimed at allowing the shcematic representation to be made explicit.

Meanwhile, @kcoyle has pointed out how this correlates with the FRBR conceptualization, which I've attempted to make more explicit two comments up.

RubenVerborgh commented 6 years ago

The issue with that is that it conflates schematic representation and serialization into the one step.

Not conflates, but combines. Why is that an issue?

A representation can be negotiated over multiple dimensions, including media type, profile, language, etc.

As I understand it, this requirement (Profile Negotiation) is aimed at allowing the shcematic representation to be made explicit.

Yes.

dr-shorthair commented 6 years ago

Yes, combines - that is a better word. Not a problem, but an issue that is being teased out in the discussion here. Yes, multiple dimensions. FRBR privileges schematic representation very high up the conceptual stack, with its own class, while somehow the web had neglected it until now!

kcoyle commented 6 years ago

Actually, FRBR is based on documents and doesn't really fit well with data - the whole "work/expression" thing is very text-based, and even librarians complain that they can't fit it will into music, film, etc. Rather than reference FRBR, why not simply say that there is an abstraction of the dataset which has certain metadata functionality (e.g. describes the dataset apart from any specific instances of it), and there are one or more distributions which have byte-presence.

kcoyle commented 6 years ago

@nicholascar In my mind, a profile defines a distribution. Presumably conneg requests a distribution that conforms to a profile. I'm not sure what you mean by "a Sample identified by URI with profiles of metadata for different purposes." This seems to be analogous to the library case, where there is a physical thing (book) that is described by metadata; and there can be profiles governing what metadata is distributed. Is that the same?

nicholascar commented 6 years ago

I am a little lost in the multiple things being considered here.

Can I check this RDF as an assertion, based on Ruben's comment https://github.com/w3c/dxwg/issues/74#issuecomment-396729390:

:Dataset_X dcat:distribution :Distribution_Y .

:Distribution_Y 
    # a dimension, currently missing
    dct:conformsTo :Profile_Z ;  
    # another dimension, currently often catered for 
    dct:format <some_format> ; 
    # another example of a dimension
    # dct:language is indicated for Catalogue & Dataset only in DCAT1.0 but could be here due to no fixed DCT domain
    dct:language <http://id.loc.gov/vocabulary/iso639-1/en> .

You could get Distribution_Y by asking for Dataset_X with a distribution conforming to Profile_Z.

Interpretation using ProfileDesc:

no inference is drawn linking a dcat:Distribution to a prof:ImplResourceDesc due to no fixed DCT domains however usage makes them look related

the Profile referenced by the Distribution could be linked to validating tools via the prof:resource property, as per intended ProfileDesc use

:Profile_Z prof:resource :ImplResDesc_A ;
dct:conformsTo <A_validation_standard> ;  
dct:format <some_other_format> ; 
prof:resourceRole rolesvoc:ConformanceTest .

RubenVerborgh commented 6 years ago

The RDF snippet works for me.

nicholascar commented 6 years ago

@kcoyle in https://github.com/w3c/dxwg/issues/74#issuecomment-396971886: I think there is an analog of sorts between my Sample example and your Book example but I'm keen to avoid any inferencing whereby someone then thinks that a Sample (or a Book) is then a Dataset. This would mean ensuring that while a profile could govern metadata distributed, what is distributed need not necessarily be a Distribution.

We can achieve this by having ProfileDesc as the general purpose ontology and ProfileDesc-like functionality allowed in DCAT, as indicated in my comment immediately above.

nicholascar commented 6 years ago

The test implementation of the Media Types Linked Data API I just set up implements both QSA & HTTP format & language negotiation within QSA & HTTP profile negotiation, e.g.:

Format:
Entry for https://w3id.org/mediatype/text/csv in RDF (turtle), default profile:
curl -L -H "Accept: text/turtle" http://w3id.org/mediatype/text/csv

Entry for https://w3id.org/mediatype/text/csv in HTML, ‘alternates’ profile (‘view’ as the API calls it) requested using the URI https://promsns.org/def/alt: curl -L -H "Accept-Profile: <https://promsns.org/def/alt>" https://w3id.org/mediatype/text/csv

As above but in RDF (JSON-LD):
curl -L -H "Accept-Profile: <https://promsns.org/def/alt>" -H "Accept: application/rdf+json" https://w3id.org/mediatype/text/csv

Demo of weighted profile neg with not available view being ignored (not receiving HTTP 406):
curl -L -H "Accept-Profile: <http://example.org/notavailable>, <https://promsns.org/def/alt>; q=0.5" -H "Accept: application/rdf+json" https://w3id.org/mediatype/text/csv

Entry for https://w3id.org/mediatype/text/csv, alternates’ profile indicated by QSA using token & Media Type also indicated by QSA:
curl -L http://w3id.org/mediatype/text/html\?_view=alternates\&_format=application/rdf+xml

Entry for https://w3id.org/mediatype/text/csv default profile with format indicated by QSA using token overriding HTTP Accept header:
curl -L -H "Accept: application/rdf+xml" http://w3id.org/mediatype/text/html\?_format=text/turtle

Language:
A Media Type, default view, HTML, in Polish:
https://w3id.org/mediatype/audio/3gpp?_lang=pl

A Media Type, default view, HTML, in Polish (preferred), using HTTP headers
curl -L -H "Accept: text/html" -H "Accept-Language: pl, en" https://w3id.org/mediatype/audio/3gpp

In this configuration, both the format and language dimensions of the resource are dependent on (configured for a particular) profile. The alternates view of a Media Type shows all the options:

https://w3id.org/mediatype/audio/3gpp?_view=alternates

Note that the alternates view itself is only available in English and that the non-HTML serialisations of the “mt” view, while supposedly bing in Polish actually are not. This is an error for the dataset implementer (me) to fix with RDF lang mappings but the API is operating correctly now with both format & lang within profile QSA and HTTP-based negotiation.

Not Implemented yet: A lot of things:

HTTP-based requests for profiles available for instance
Profile Description Ontology terminology - still using the Alternates View RDF

This is just a start.

azaroth42 commented 6 years ago

A concrete use case that we have at the Getty today, that might help some of the commenters or at least provide an avenue for further clarifications:

The Getty Vocabularies are available as Linked Open Data. We currently provide exactly one schema which is a large super-set of SKOS. This schema is appropriate if you want to know absolutely every last thing that we know about the thesaurus terms. This is true for almost no one, it turns out ;)

We also manage data in the institution using a profile of CIDOC-CRM, with which SKOS is not very well-aligned natively but is trivially mappable. For consistency with these other holdings, we would like to make the vocabularies available at the same URIs using this profile. This demonstrates two points:

The media type is orthogonal to the profile, as you could ask for the full mega-skos profile in turtle, json-ld or rdf/xml, and the CRM profile in any of those formats too.
The URI is critically important to be the same for vocabulary entries, as a different URI would mean a different concept.

We also intend to have a pure SKOS profile for consumers that don't care about everything, but do need SKOS. Again, the format and profile are orthogonal in the same way, and the URI being the same is critical.

Please compare:

http://vocab.getty.edu/aat/300011021
http://aat-web-services-staging.getty.edu/aat/300011021 (alpha at current date for feedback, please don't use for anything!)

kcoyle commented 6 years ago

Rob's example above is what I would call the output from a "cross-walk" - data is converted from some database or metadata schema to another, and these schemas, in some cases, may be application profiles depending on their contents and functionality. It isn't clear to me if every use of metadata is a profile, however, so referring to profiles in the conneg work may not meet our definition of "profile", which is not (AFAIK) "any metadata schema." And not including non-profile metadata schemas may not satisfy the needs of conneg. We are going to have to spend some time on definitions. Note that we have (so far) defined profiles as:

A profile is a named set of constraints on one or more identified base specifications, including the identification of any implementing subclasses of datatypes, semantic interpretations, vocabularies, options and parameters of those base specifications necessary to accomplish a particular function.

I think this is more restrictive than "arbitrary metadata schema".

Wanting to serve the same data using a different metadata schema has the reputation of being lossy (in terms of absolute semantics). Rob says: "a different URI would mean a different concept." But I'm not so sure that we aren't talking about different concepts, although I realize that this becomes philosophical at a point. I believe this is what is bothering @agreiner. These are different datasets. That doesn't mean that you can't give an identifier to your data in all of its forms, but the same data served with different metadata schemas as a result of a conversion process is indeed a different dataset. But what is really troubling me is the use of "profile".

(I know that "schema" isn't a great word to use here - substitute "model" or whatever you prefer if it bothers you.)

azaroth42 commented 6 years ago

I believe that our use case falls under that definition, in that both profiles have multiple base specifications, with subclasses, specific interpretations, identified vocabularies for the data instances and are there to accomplish particular functions.

We are not talking about two different real world concepts of "gold", and hence the URI must be the same. If RDF/XML and Turtle are not different datasets, but SKOS and CIDOC-CRM are, then it seems the philosophy of the content negotiation deliverable is not aligned with the DCAT deliverable.

As a reductio ad absurdum, if in model (A) the requirement is to usedc:title, and in model (B) the requirement is to use rdfs:label but (A) and (B) are otherwise identical, that would be two different datasets. This seems ... undesirable.

kcoyle commented 6 years ago

Rob, I do see the problem as the alignment between the use of the term "profile" in the two different deliverables. Whether we can align them, we'll have to see. The use of "application profile" in deliverable 2 (guidance for APs) becomes quite broad if we are to cover ANY metadata. Yet the conneg use case may need to allow for any metadata schema, not just those that meet our definition of "profile."

As for if (A) and (B) are different datasets, the definition that I find in the DCAT document is:

"A dataset in DCAT is defined as a "collection of data, published or curated by a single agent, and available for access or download in one or more formats". A dataset is a conceptual entity, and may be represented by one or more distributions that serialize the dataset for transfer. "

Earlier discussion has likened DCAT datasets to FRBR:work (lots of warts there), so your definition of dataset coincides with the DCAT one, and I used "dataset" perhaps more in line with DCAT's "distribution" which reads: "Definition: | Connects a dataset to its available distributions." That definition seems to be undergoing discussion, and the emphasis on "serialization" may be an issue. I also note that "format" is dct:format, aka IANA media type. However, I'll try to be more in line with DCAT definitions in the future.

larsgsvensson commented 6 years ago

The Use Case that @azaroth42 preesents sounds very similar to the one we have in the DNB where we want to and was described above. Good to hear we're not alone!

nicholascar commented 4 years ago

de-tagging as Profile Negotiation

w3c / dxwg

Profile negotiation [RPFN] #74

Profile negotiation [RPFN]