Closed dr-shorthair closed 6 years ago
Possible motions for this week's DCAT meeting:
dcat:Dataset
and dcat:Catalog
Thanks for drafting this proposal, @dr-shorthair .
I have a couple of comments:
dcat:Catalog
should be made a subclass of dctype:Service
. Its definition in DCAT 1.0 says "A data catalog is a curated collection of metadata about datasets." So, although it may correspond to a catalogue service (as done in GeoDCAT-AP), it can simply be a "static" collection, a mechanism for grouping datasets based on some criteria. I'm aware of examples of this use - e.g., the list of datasets produced by a project/activity.dcat:Catalog
can include records of "data services" only. I agree this is the case for ISO 19115, but I think we should relax such range constraint to allow any type of services to support other use cases. E.g., in DataCite, a service is not necessarily a data service. Moreover, we have examples in Europe of catalogues of online/offline public services (whose metadata follow the Core Public Service vocabulary).dcat:Catalog
to include metadata records of other dcat:Catalog
's - I mean, not as sub-catalogues. This is supported in ISO 19115.OK - remove sub-class axiom from dcat:Catalog
That limitation was not the intention. I was just taking an incremental approach: right now we know about Data Services, so we add them; later we may know about other things, so we can add them then.
A challenge in modeling is whether to be parsimonious - only model the things we know about now - else attempt to provide a generic home for things we haven't yet encountered, but have a hunch about. These days I tend to err towards the former, and take good care of what we do know about, and leave the unknowns for the future. I lean towards having well-named predicates so went for dcat-s:dataService
to link to a dcat-s:DataService
by analogy with dcat:dataset
that links to dcat:Dataset
. It might be generalized a little, but then we end up at dcat:WebService
... which is already deprecated! (Can you un-deprecate something??)
Resolutions from DCAT team telecon:
Definition of the class(es) for data services (or other services) to be determined.
Given @dr-shorthair's representation of the chosen solution (as discussed in last week's call) and available at https://github.com/w3c/dxwg/wiki/Cataloguing-data-services#chosen-solution, I suggest we rename 'Service' to 'DataService' to make it more specific.
Following the discussion on the call today, I will revise the reasons behind calling 'Service' rather than 'DataService'. I also raised that I think it might be too complex to have a typology of services, as it might not be complete. Rather, we could use a generic service class and characterise it with specific attributes.
I proposed just 'Service' to allow for cataloging of other kinds of services (authentication?, entertainment?). We also might have just used dctype:Service but I feel we should have DCAT classes in our backbone;
I also saw the possibility of having no specializations, but DataDistributionService
seems like a central concern, and since we also want some links to Datasets and Distributions, it is much easier to axiomatize these with a named class.
What happens if one has one or more datasets and related services, but does not define a DCAT catalog as such? As I recall, one can use DCAT to define datasets on their own, not within catalogs. Also, when I look at sites with services (e.g. https://www.ny.gov/services/health) most of the services are not related to data sets. Do you expect that these services could be included in DCAT catalogs?
@kcoyle The relationship between Catalogs and Datasets is already in issue https://github.com/w3c/dxwg/issues/62. As far as I see it, there is nothing in the specification of DCAT that would stop someone to create and describe a Dataset without having a Catalog. I see no problems allowing a Catalog to include Services that are not linked to Datasets.
@makxdekkers In that case, DCAT expands to service sites that are unrelated to datasets - which may indeed be fine, but could be confusing because of the term "Data" in the name. However, a definition at the beginning of the document could expand the use of "Data" to include information services, and we could emphasize that aspect in other documentation, such as primers, etc.
As with any other RDF vocabulary it is not in our power to control how people use it.
DCAT is designed to be primarily a model for catalogs of datasets, and now also dataservices, and that is what our documentation will describe. But individual classes and predicates in the DCAT namespace might find good use in other applications and I don't see how this could be wrong if it suits a purpose.
Since we touched on this point (re:quite what is the scope of the catalogue is in our discussions) on the DCAT call, I wanted to suggest that we need to be clear where we are recommending what we publish in the new version of the DCAT vocab while at the same time recognising there exist other situations where our approach to catalogues might potentially be influential.
We don't want (or have the time/effort) to get tangled up in domains which have introduced catalogues for their own purposes even if we suspect that there is likely a common pattern...
I support the idea of not limiting dcat:Catalog
's to data-related resources. On the other hand, I recognise the risk of leaving the door to much open. As @dr-shorthair said, we cannot control how people will use a vocabulary, but at least we can provide guidance on how it should be used.
Thinking about what such guidance could be, an option could be to refer to existing catalogue standards / communities (CSW, OAI-PMH, DataCite), and the different types of resources they support. All these communities are potential users of DCAT (actually, at least in the geo domain, they are already DCAT users), which gives one of the motivations of expanding the scope of DCAT. So, we may say something like: if your resources are of one those types used in such communities, they are in the scope of DCAT. Otherwise, it may be not the case - and here we can give examples of resources that shouldn't be part of a dcat:Catalog
(if any),
at least we can provide guidance on how it should be used.
perhaps "at least we can provide guidance on how it can be used for what it was designed for."
I think this can be marked Resolved thanks to #241
DCAT version 1 supported cataloging of Datasets. Two of the Use Cases describe the need to also support cataloguing of Services:
There is evidence of use of complicated, indirect approaches to resolve this in existing deployments - see https://github.com/w3c/dxwg/issues/116#issuecomment-374786075 https://github.com/w3c/dxwg/issues/166#issuecomment-373878749
The following issues have discussed aspects of the problem and links between Datasets, Distributions and Distribution services in the context of a DCAT Catalog: #56, #116, #124, #145, #166 Some of these discuss potential solutions before the requirements have been clearly articulated and agreed.
This issue is to clarify the scope of DCAT. When this has been clarified we can then determine the best way to meet the requirements, through creative use or adaptation of the existing vocabularies, or by adding some classes and properties to DCAT, or some other method.