Open jpullmann opened 6 years ago
:+1:
I have put up a proposal at https://github.com/w3c/dxwg/tree/profiledesc-working/profiledesc/profileneg
Nick, profile negotiation is its own deliverable, as per the charter, and is so far based on a proposal by Lars and Ruben: https://profilenegotiation.github.io/I-D-Accept--Schema/I-D-accept-schema. It would be best not to start a separate effort, but to further what is already proposed. Also note that any "solutions" must be based on use cases and requirements. As I have mentioned before, we appear to be lacking use cases that would lead to the profileDesc work and this profile negotiation proposal.
I think that the work I’ve outlined above is compatible with Lars’ & Ruben’s work.
In the implementions we’ve used before, a _format Query Sting Argument is used instead of it as a override for Accept header and _view QSA is effectively the equivalent of Accept-Profile.
I would be able to implement Profile headers in the 6 or so APIs delivering different profiles in operation now if I can get persistent URIs for the profiles.
We have discussed the registration of Profiles within our Govt Linked Data WG as registration would give them a persistent URI. We will likely register a series of Profiles for purposes such as an energy sector profile of DCAT (2018) but currently we are unclear about whether a catalogue of known profiles is needed or even possible. We may make such a thing for Aust Gov-approved profiles.
I think we should be careful to try to standardise a way of putting profile information into URIs/URLs by mandating the use of _format or _view. I agree that it's one way of doing it, but there are others as well. The URLs to the specific resource versions can be propagated using http Link
-headers or html link
elements (and of course as normal <a href=...
in the html pages).
A registry for profiles sounds good. There could even be several, community-specific registries.
I agree that URI QSAs are only one of many ways of doing it and perhaps even a secondary way with HTTP headers being the primary, however I think such easy human use ways are very useful, hence my Use Case https://github.com/w3c/dxwg/issues/239
Since we are providing profile guidance, not just a single standard, I think we can base URI methods on (to be compatible with) HTTP methods.
I don't disagree that we need easy ways for humans to address profiled versions of documents. What I disagree with is to say that we should mandate the use of _format
or _view
. There are other ways we can do that in the URL, e.g. by using a syntax à la http://example.org/entity.profile.filetype
(e. g. http://example.org/myCatalogue.dcat-ap-de.ttl
identifying the turtle serialisation of a dcat-catalogue using the DCAT-AP.de profile) instead of using http://example.org/myCatalogue?_view=dcat-ap-de&_format=turtle
Let's not break the Web; no spec should mandate the URL structure of a server.
A secondary way can just be to follow links, i.e., opening the main profile URI in the browser results in an HTML document with links to other representations (for which the server can determine the URIs of its own).
+1 to @RubenVerborgh
I understand that the motivation for profile negotiation is that some users want to be able to use the same URL for versions of a dataset that conform to different profiles. Can anyone explain why some users want that? It does work to model the versions as representations of a resource, but it would also work to model them as separate resources (as in Ruben's suggestion above).
I understand that the motivation for profile negotiation is that some users want to be able to use the same URL for versions of a dataset that conform to different profiles.
No, the motivation is to have the same resource available in different profiles. And resources on the Web happen to be identified by URLs.
Note that each representation still can have its own URL. We will just provide the mechanism to get from resource to representation.
Can anyone explain why some users want that?
It does work to model the versions as representations of a resource, but it would also work to model them as separate resources (as in Ruben's suggestion above).
Both models are the exact same, really.
To understand this, it's important to see that the "representation" concept is a relative notion. E.g., in the sentence "A is a representation of B", B the resource that A is the representation of. However, A is a resource in its own right.
An example to clarify:
Regardless of whether 2 has its own URL, all of the following hold:
I'm talking about the motivation to use negotiation. If the only motivation is to have the same resource available in conformance to different profiles, I don't see any particular reason to have profile negotiation that works like content negotiation. Having multiple profiles available is realized already by just offering a version of the dataset that conforms to one profile under one URL and a version that applies to another under another URL. Sorry I can't recall where it was expressed, but the idea of one URL for multiple profiles came from someone else in the group (maybe Lars?).
@agreiner
Create a way to negotiate choice of profile between clients and servers https://www.w3.org/TR/dcat-ucr/#RPFN
I'm talking about the motivation to use negotiation.
Negotiation is what gets clients to the representation with their preferred profile.
If the only motivation is to have the same resource available in conformance to different profiles
No, that's not the motivation. We can do that with existing technologies already.
What existing technologies don't do, is automatically getting a resource represented in a profile the client understands.
I don't see any particular reason to have profile negotiation that works like content negotiation.
It's just like negotiating between XML or JSON, except more fine-grained: https://ruben.verborgh.org/articles/fine-grained-content-negotiation/
Having multiple profiles available is realized already by just offering a version of the dataset that conforms to one profile under one URL and a version that applies to another under another URL.
But how does the client get from one to the other? Our answer: content negotiation.
Can we use DCAT as an example? I'm going to toss one out but it may not be correct. What if you have a dataset that has a whole lot of census-type data, which includes a wide range of elements that can be seen as about people (age, race, employment, location). Not every use of the data wants to make use of all of the columns in the table. Would different profiles be the way to get the view of the data that you desire? If so, could there be a direct correlation between profiles and services? Or could it be that one person's profile is another person's service?
Would different profiles be the way to get the view of the data that you desire?
Yes, profiles could be defined for views you want to see.
If so, could there be a direct correlation between profiles and services? Or could it be that one person's profile is another person's service?
Well… services and resources are different abstractions of Web interfaces. The resource-oriented view is that you ask for a specific representation (tied to a profile) of a resource. The service-oriented view is that you send a command to a server that provides you with a representation conforming to a certain profile.
@agreiner scripsit:
Having multiple profiles available is realized already by just offering a version of the dataset that conforms to one profile under one URL and a version that applies to another under another URL. Sorry I can't recall where it was expressed, but the idea of one URL for multiple profiles came from someone else in the group (maybe Lars?).
Yes, that was me. Our use case is that we have two linked data serviced offering data about the same entities (e. g. persons and geographic entities) but using two different metadata profiles. The first one is our default one served through the domain d-nb.info, e. g. http://d-nb.info/gnd/118601717/about/lds. the other one is a beefed-up version also offering things like links to images at wikimedia commons, e. g. http://hub.culturegraph.org/entityfacts/118601717 that is used to drive the entity pages in the German Digital Library at https://www.deutsche-digitale-bibliothek.de/entity/118601717. The point is that both representations are about the same entity, identified by http://d-nb.info/gnd/118601717 and we want to serve both representations using the same URI. The solution to this would be profile negotiation.
@agreiner answered on the mailing list: Thanks, Lars,
Can you explain the value that you see in having the same URL for both datasets? What gives me pause here in particular is the mention that one is a beefed-up version with links to images. To me, that suggests that they are really two different resources; one clearly contains more stuff. Would you also make both datasets available under separate URLs for human consumption?
-Annette
@RubenVerborgh answered on the mailing list: Hi Annette,
Obviously not Lars, but my two cents below :-)
Can you explain the value that you see in having the same URL
So that we can link to the data, regardless of how it is represented.
I.e., for the same reason that we link to http://dbpedia.org/resource/Marie_Curie instead of http://dbpedia.org/data/Marie_Curie.json or http://dbpedia.org/page/Marie_Curie, since the first URL can be used for clients of any kind, whereas the two others are specific to certain types of client.
Furthermore, the first URL remains valid if new representations are added in the future.
for both datasets?
Nitpick: you call them "both datasets", implying that they are different datasets. While we probably shouldn't get too philosophical on what a dataset is and isn't, but Lars described his case as:
data about the same entities (e. g. persons and geographic entities) but using two different metadata profiles.
so the dataset seemed the same.
To me, that suggests that they are really two different resources;
Here I want to point out again that different representations A and B are different resources. However, Lars seems to imply that both A and B are representations of a dataset C.
The resource "the HTML version of X" is a different resource than "the JSON version of X"; however, both are representations of X.
So whether or not they are different resources (they are) does not seem the question here.
Best,
Ruben
And now I comment myself...
(@agreiner ) Can you explain the value that you see in having the same URL for both datasets?
At this level I'm not concerned with datasets but with arbitrary entities (identified by URIs) that can have 1..n representations (also identified by URIs that in most cases are also URLs). And from my point of view the entities belong to one dcat:Dataset
. The representations of those entities (e. g. modelled using profile-1
and serialised as text/turtle
, or modelled using profile-2
and serialised as application/ld+json
) can then be collected and publshed as dcat:Distribution
s of the said dcat:Dataset
.
(@RubenVerborgh ) I.e., for the same reason that we link to http://dbpedia.org/resource/Marie_Curie instead of http://dbpedia.org/data/Marie_Curie.json or http://dbpedia.org/page/Marie_Curie, since the first URL can be used for clients of any kind, whereas the two others are specific to certain types of client.
Furthermore, the first URL remains valid if new representations are added in the future.
+1
(@RubenVerborgh ) Nitpick: you call them "both datasets", implying that they are different datasets. While we probably shouldn't get too philosophical on what a dataset is and isn't, but Lars described his case as:
data about the same entities (e. g. persons and geographic entities) but using two different metadata profiles.
so the dataset seemed the same.
Yes, from my POV the entities are in the same dataset but the different representations are in different distributions.
(@agreiner) To me, that suggests that they are really two different resources
It's all about the same August Rodin, identified by http://d-nb.info/gnd/118601717. And, as Ruben stated, there are several resources that work as representations of Rodin (or the metadata about him). They are targetted at different audiences and thus have different profiles, but they are still describing (representing) the same entity. So if you wish you can see this as a move towards entity-based identification as opposed to representation-based identification.
For the main example in Use Case 239, I referred to the views or profiles of the metadata for a physical sample, AU239 (coincidental numbering). That sample has different metadata for different audiences (legacy XML format, current SOSA RDF etc.) but we certainly want the same URI for the sample. Currently we're using query string args to separate out the profiles but would like to support HTTP profile negotiation for smarter machine clients.
We have to use the same URI for the sample as one of the reasons we have URIs for samples at all is to de-duplicate references to the same sample and to do that we need to know that it's really the same thing which, although possible with multiple URIs, is much easier with a single one.
We're trying to say "metadata for sample AU239 is at URI http://pid.geoscience.gov.au/sample/AU239 regardless of the form of metadata you want".
(Finally updated link to my book, also putting it here: http://kcoyle.net/beforeAndAfter/index.html)
It feels to me that in some of the discussion there is confounding of profiles and serializations. That's something we need to be careful about - profiles and serializations are orthogonal.
The example:
for the same reason that we link to http://dbpedia.org/resource/Marie_Curie instead of http://dbpedia.org/data/Marie_Curie.json or http://dbpedia.org/page/Marie_Curie
makes we wonder if we haven't ventured into FRBR Work territory.[1] (I recall some mention of DCAT dataset being at the FRBR work level.) If anyone wants to do that, then the work and the distributions and the profiles all will have URIs, otherwise they have no existence in the web sense. Whether we prefer to use the work URI in a query doesn't mean that the distributions and profiles do not have a URI - if they are on the web, they have a URI. It also seems that they will almost surely have a profile-based web identifier when they are the response to a content negotiation action. (Just as the result of each SPARQL query has a web identifier, albeit temporary in scope.) When one asks for "http://dbpedia.org/resource/Marie_Curie" in json, one presumably gets "http://dbpedia.org/resource/Marie_Curie.json". When one asks for "http://dbpedia.org/resource/Marie_Curie" as defined by ProfileX, then there needs to be a unique identifier for that data in that profile. Is defining this part of the content negotiation deliverable?
[1] The "undifferentiated work URI" is a hairy thing. If it is to be defined as part of content negotiation I urge caution. I need to add that I am one of the staunchest FRBR skeptics, and wrote an entire book on why I feel that way. All I can say is "there be dragons" so think it through very carefully.
(I recall some mention of DCAT dataset being at the FRBR work level.)
Yes - that is the unwritten premise behind my comments here https://github.com/w3c/dxwg/issues/55#issuecomment-394575989 https://github.com/w3c/dxwg/issues/52#issuecomment-394575481
(@kcoyle The link to the skeptical book is probably not the one you intended - could you fix it so I can look)
It feels to me that in some of the discussion there is confounding of profiles and serializations.
How would you define a serialization, and how is it different from a representation?
That's something we need to be careful about - profiles and serializations are orthogonal.
Do you mean profiles and media types? (If so, I agree.)
When one asks for "http://dbpedia.org/resource/Marie_Curie" as defined by ProfileX, then there needs to be a unique identifier for that data in that profile. Is defining this part of the content negotiation deliverable?
It might or might not have its own identifier (it's very useful if it does). But this deliverable should not specify anything about the form of this identifier, apart from the fact that it might exist.
Best,
Ruben
Yes, my serialization is your media type.
"It might or might not have its own identifier " - if there is no identifier, how will it be accessed/transmitted?
It's fine to have a "work" identifier (although again I caution that one needs to think very hard about what that identifier identifies), but any resource on the web has an identifier for the resource, not just the work. This is why I recommend that this work vs. actual thing be thought through carefully, and the relationship between those be clear. I don't know DCAT terribly well but this seems to be a difference between dataset and distribution. Obviously, the response to content negotiation is some form of distribution (in DCAT terms). In the FRBR sense, the work is an abstract concept with no physical/digital presence, and it is only when it is manifested (distributed) is there a non-abstract thing. So as long as the URI for the dataset refers to an abstraction, that makes sense, but I'm not clear on what the non-abstraction consists of.
Yes, my serialization is your media type.
That might be a bit confusing then, because a serialization (as in "a concrete series of bytes representing a dataset") would be determined by multiple factors, such as media type, language, and profile.
"It might or might not have its own identifier " - if there is no identifier, how will it be accessed/transmitted?
Access through the non-negotiated identifier; indicate your preferences in headers. The server replies with the negotiated response.
but any resource on the web has an identifier for the resource, not just the work.
Any resource on the Web can have an identifier.
I don't know DCAT terribly well but this seems to be a difference between dataset and distribution.
A distribution is a representation of a dataset.
So as long as the URI for the dataset refers to an abstraction, that makes sense, but I'm not clear on what the non-abstraction consists of.
It refers to the dataset.
Ruben
I think you misunderstood my question about non-abstractions, so let me make it clearer.
As I understand it: DCAT dataset is an abstraction. It is only the distributions that are "real" - that is, that can be accessed. There is no access to a dataset EXCEPT through a distribution (in DCAT).
Therefore the non-abstraction above refers to a distribution (as defined in DCAT, which has some permanence) or some other "thing" that is returned from content negotiation. What I'm wondering is whether, in DCAT parlance, what is served through conneg is a distribution, or if it is something else, or if this isn't the right question?
Adding (from DCAT): dcat:Catalog represents the catalog dcat:Dataset represents a dataset in a catalog. dcat:Distribution represents an accessible form of a dataset as for example a downloadable file, an RSS feed or a web service that provides the data.
Therefore the non-abstraction above refers to a distribution (as defined in DCAT, which has some permanence)
OK.
What I'm wondering is whether, in DCAT parlance, what is served through conneg is a distribution, or if it is something else, or if this isn't the right question?
A distribution.
Ruben
@kcoyle & @RubenVerborgh: are you saying above that one can interpret a resource and profiles of it as a Dataset and Distributions of it?
If so, I think this is problematic. I see many more types of Resources and Profiles of them than DCAT will allow for. E.g., a Sample identified by URI with profiles of metadata for different purposes. The Resource + Profiles pattern holds here but not Dataset +Distributions.
I can think of other cases: Datasets are just too “big” a thing for many Resources to be sensibly interpreted as them
@kcoyle & @RubenVerborgh: are you saying above that one can interpret a resource and profiles of it as a Dataset and Distributions of it?
I'm saying that a dataset is a resource, and that representations of that dataset conforming to certain profiles and serialized in a certain media type are distributions.
I see many more types of Resources and Profiles of them than DCAT will allow for.
That's fine. The mechanism is more generic than that. It's not because a dataset is a resource, that all resources are dataset.
The alignment of DCAT and FRBR [1] is incomplete -
frbr:Work
is effectively implemented in dcat:Dataset
frbr:Manifestation
is implemented in dcat:Distribution
frbr:Expression
is not implemented in DCAT - probably because in practice there is no artefactdct:conformsTo
provides a hook to indicate the standard (which can be a schema or profile) that a resource conforms to. But in DCAT that is associated with dcat:Resource|dcat:Dataset
and not with dcat:Distribution
. How is it typically used?
In order to fully match FRBR we would need a way to indicate different schematic representations of a dataset (i.e. conforming to different profiles), alongside the different serializations (media-types). Maybe add dct:conformsTo
to dcat:Distribution
where it should be used to indicate the schema/profile/view that this representation takes.
[1] https://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records
dcat:Distribution represents an accessible form of a dataset as for example a downloadable file, an RSS feed or a web service that provides the data.
@kcoyle - now we have added an explicit class for services (dcat:DataService
and sub-classes) dcat:Distribution
should not be used for a service. The definitions in the DCAT Editors Draft [1] have been tweaked slightly, but certainly could be further improved.
In an email that has not made into this GitHub thread, @agreiner takes us back to Fielding's analysis of web architecture, which distinguishes only Resource and Representation. The issue with that is that it conflates schematic representation and serialization into the one step.
As I understand it, this requirement (Profile Negotiation) is aimed at allowing the shcematic representation to be made explicit.
Meanwhile, @kcoyle has pointed out how this correlates with the FRBR conceptualization, which I've attempted to make more explicit two comments up.
The issue with that is that it conflates schematic representation and serialization into the one step.
Not conflates, but combines. Why is that an issue?
A representation can be negotiated over multiple dimensions, including media type, profile, language, etc.
As I understand it, this requirement (Profile Negotiation) is aimed at allowing the shcematic representation to be made explicit.
Yes.
Yes, combines - that is a better word. Not a problem, but an issue that is being teased out in the discussion here. Yes, multiple dimensions. FRBR privileges schematic representation very high up the conceptual stack, with its own class, while somehow the web had neglected it until now!
Actually, FRBR is based on documents and doesn't really fit well with data - the whole "work/expression" thing is very text-based, and even librarians complain that they can't fit it will into music, film, etc. Rather than reference FRBR, why not simply say that there is an abstraction of the dataset which has certain metadata functionality (e.g. describes the dataset apart from any specific instances of it), and there are one or more distributions which have byte-presence.
@nicholascar In my mind, a profile defines a distribution. Presumably conneg requests a distribution that conforms to a profile. I'm not sure what you mean by "a Sample identified by URI with profiles of metadata for different purposes." This seems to be analogous to the library case, where there is a physical thing (book) that is described by metadata; and there can be profiles governing what metadata is distributed. Is that the same?
I am a little lost in the multiple things being considered here.
Can I check this RDF as an assertion, based on Ruben's comment https://github.com/w3c/dxwg/issues/74#issuecomment-396729390:
:Dataset_X dcat:distribution :Distribution_Y .
:Distribution_Y
# a dimension, currently missing
dct:conformsTo :Profile_Z ;
# another dimension, currently often catered for
dct:format <some_format> ;
# another example of a dimension
# dct:language is indicated for Catalogue & Dataset only in DCAT1.0 but could be here due to no fixed DCT domain
dct:language <http://id.loc.gov/vocabulary/iso639-1/en> .
You could get Distribution_Y by asking for Dataset_X with a distribution conforming to Profile_Z.
Interpretation using ProfileDesc:
dcat:Distribution
to a prof:ImplResourceDesc
due to no fixed DCT domains however usage makes them look related prof:resource
property, as per intended ProfileDesc use
:Profile_Z prof:resource :ImplResDesc_A ;
dct:conformsTo <A_validation_standard> ;
dct:format <some_other_format> ;
prof:resourceRole rolesvoc:ConformanceTest .
The RDF snippet works for me.
@kcoyle in https://github.com/w3c/dxwg/issues/74#issuecomment-396971886: I think there is an analog of sorts between my Sample example and your Book example but I'm keen to avoid any inferencing whereby someone then thinks that a Sample (or a Book) is then a Dataset
. This would mean ensuring that while a profile could govern metadata distributed, what is distributed need not necessarily be a Distribution
.
We can achieve this by having ProfileDesc as the general purpose ontology and ProfileDesc-like functionality allowed in DCAT, as indicated in my comment immediately above.
The test implementation of the Media Types Linked Data API I just set up implements both QSA & HTTP format & language negotiation within QSA & HTTP profile negotiation, e.g.:
Format:
Entry for https://w3id.org/mediatype/text/csv in RDF (turtle), default profile:
curl -L -H "Accept: text/turtle" http://w3id.org/mediatype/text/csv
Entry for https://w3id.org/mediatype/text/csv in HTML, ‘alternates’ profile (‘view’ as the API calls it) requested using the URI https://promsns.org/def/alt:
curl -L -H "Accept-Profile: <https://promsns.org/def/alt>" https://w3id.org/mediatype/text/csv
As above but in RDF (JSON-LD):
curl -L -H "Accept-Profile: <https://promsns.org/def/alt>" -H "Accept: application/rdf+json" https://w3id.org/mediatype/text/csv
Demo of weighted profile neg with not available view being ignored (not receiving HTTP 406):
curl -L -H "Accept-Profile: <http://example.org/notavailable>, <https://promsns.org/def/alt>; q=0.5" -H "Accept: application/rdf+json" https://w3id.org/mediatype/text/csv
Entry for https://w3id.org/mediatype/text/csv, alternates’ profile indicated by QSA using token & Media Type also indicated by QSA:
curl -L http://w3id.org/mediatype/text/html\?_view=alternates\&_format=application/rdf+xml
Entry for https://w3id.org/mediatype/text/csv default profile with format indicated by QSA using token overriding HTTP Accept header:
curl -L -H "Accept: application/rdf+xml" http://w3id.org/mediatype/text/html\?_format=text/turtle
Language:
A Media Type, default view, HTML, in Polish:
https://w3id.org/mediatype/audio/3gpp?_lang=pl
A Media Type, default view, HTML, in Polish (preferred), using HTTP headers
curl -L -H "Accept: text/html" -H "Accept-Language: pl, en" https://w3id.org/mediatype/audio/3gpp
In this configuration, both the format and language dimensions of the resource are dependent on (configured for a particular) profile. The alternates view of a Media Type shows all the options:
https://w3id.org/mediatype/audio/3gpp?_view=alternates
Note that the alternates view itself is only available in English and that the non-HTML serialisations of the “mt” view, while supposedly bing in Polish actually are not. This is an error for the dataset implementer (me) to fix with RDF lang mappings but the API is operating correctly now with both format & lang within profile QSA and HTTP-based negotiation.
Not Implemented yet: A lot of things:
This is just a start.
A concrete use case that we have at the Getty today, that might help some of the commenters or at least provide an avenue for further clarifications:
The Getty Vocabularies are available as Linked Open Data. We currently provide exactly one schema which is a large super-set of SKOS. This schema is appropriate if you want to know absolutely every last thing that we know about the thesaurus terms. This is true for almost no one, it turns out ;)
We also manage data in the institution using a profile of CIDOC-CRM, with which SKOS is not very well-aligned natively but is trivially mappable. For consistency with these other holdings, we would like to make the vocabularies available at the same URIs using this profile. This demonstrates two points:
We also intend to have a pure SKOS profile for consumers that don't care about everything, but do need SKOS. Again, the format and profile are orthogonal in the same way, and the URI being the same is critical.
Please compare:
Rob's example above is what I would call the output from a "cross-walk" - data is converted from some database or metadata schema to another, and these schemas, in some cases, may be application profiles depending on their contents and functionality. It isn't clear to me if every use of metadata is a profile, however, so referring to profiles in the conneg work may not meet our definition of "profile", which is not (AFAIK) "any metadata schema." And not including non-profile metadata schemas may not satisfy the needs of conneg. We are going to have to spend some time on definitions. Note that we have (so far) defined profiles as:
A profile is a named set of constraints on one or more identified base specifications, including the identification of any implementing subclasses of datatypes, semantic interpretations, vocabularies, options and parameters of those base specifications necessary to accomplish a particular function.
I think this is more restrictive than "arbitrary metadata schema".
Wanting to serve the same data using a different metadata schema has the reputation of being lossy (in terms of absolute semantics). Rob says: "a different URI would mean a different concept." But I'm not so sure that we aren't talking about different concepts, although I realize that this becomes philosophical at a point. I believe this is what is bothering @agreiner. These are different datasets. That doesn't mean that you can't give an identifier to your data in all of its forms, but the same data served with different metadata schemas as a result of a conversion process is indeed a different dataset. But what is really troubling me is the use of "profile".
(I know that "schema" isn't a great word to use here - substitute "model" or whatever you prefer if it bothers you.)
I believe that our use case falls under that definition, in that both profiles have multiple base specifications, with subclasses, specific interpretations, identified vocabularies for the data instances and are there to accomplish particular functions.
We are not talking about two different real world concepts of "gold", and hence the URI must be the same. If RDF/XML and Turtle are not different datasets, but SKOS and CIDOC-CRM are, then it seems the philosophy of the content negotiation deliverable is not aligned with the DCAT deliverable.
As a reductio ad absurdum, if in model (A) the requirement is to usedc:title
, and in model (B) the requirement is to use rdfs:label
but (A) and (B) are otherwise identical, that would be two different datasets. This seems ... undesirable.
Rob, I do see the problem as the alignment between the use of the term "profile" in the two different deliverables. Whether we can align them, we'll have to see. The use of "application profile" in deliverable 2 (guidance for APs) becomes quite broad if we are to cover ANY metadata. Yet the conneg use case may need to allow for any metadata schema, not just those that meet our definition of "profile."
As for if (A) and (B) are different datasets, the definition that I find in the DCAT document is:
"A dataset in DCAT is defined as a "collection of data, published or curated by a single agent, and available for access or download in one or more formats". A dataset is a conceptual entity, and may be represented by one or more distributions that serialize the dataset for transfer. "
Earlier discussion has likened DCAT datasets to FRBR:work (lots of warts there), so your definition of dataset coincides with the DCAT one, and I used "dataset" perhaps more in line with DCAT's "distribution" which reads: "Definition: | Connects a dataset to its available distributions." That definition seems to be undergoing discussion, and the emphasis on "serialization" may be an issue. I also note that "format" is dct:format, aka IANA media type. However, I'll try to be more in line with DCAT definitions in the future.
The Use Case that @azaroth42 preesents sounds very similar to the one we have in the DNB where we want to and was described above. Good to hear we're not alone!
de-tagging as Profile Negotiation
Profile negotiation [RPFN]
Create a way to negotiate choice of profile between clients and servers
Related requirements: Profile definition [RPFDF]
Related use cases: Detailing and requesting additional constraints (profiles) beyond content types [ID2] Standard APIs for metadata profile negotiation [ID30]