w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
144 stars 46 forks source link

question > is a software solution a dcat:Dataset? #1221

Closed bertvannuffelen closed 4 years ago

bertvannuffelen commented 4 years ago

Dear community,

I would like your advice on the following topic:

Can a software solution be considered as a dcat:Dataset?

makxdekkers commented 4 years ago

@heidivanparys @dr-shorthair's advice still stands. If a manager of a collection of papyrus scrolls considers DCAT useful to describe their stuff, they can use it. There may be an implicit expectation in DCAT that datasets are distributed as digital files, but I don't think this is necessarily so. We shouldn't stop people being creative.

kcoyle commented 4 years ago

@heidivanparys I don't disagree with you at all, but I was speaking strictly about DCAT. I was under the impression that DCAT organizes catalogs that are online. As I questioned above, how would DCAT be used with analog resources? No one has answered that. I find this discussion to be unhelpfully theoretical since concrete examples are sorely lacking. What is the value in speculating in this manner? In any case, there is no DCAT police and anyone can say anything about anything. The issue here should be limited to what is in the text of the DCAT document. My goal was to eliminate the use of suggestions that are unproven.

dr-shorthair commented 4 years ago

I would like to see those modeled in DCAT because I think that getting this discussion more "real" matters. Let's show our work.


samp:Specimen
rdf:type owl:Class ;
rdfs:label "Specimen" ;
rdfs:subClassOf dcat:Resource ;
.

ga:r1985-MtIsa-Wyb-398 rdf:type samp:Specimen ; dct:accessRights [ rdf:type dct:RightsStatement ; dct:description "Access to GA staff only" ; ] ; dct:created "1985-08-20" ; dct:creator https://orcid.org/0000-0001-5976-4943 ; dct:description "Granite from Mt Isa" ; dct:identifier "1985-MtIsa-Wyb-398" ; dct:type dctype:PhysicalObject ; rdfs:label "Specimen 1" ; dcat:theme [ rdf:type skos:Concept ; skos:prefLabel "Rock" ; ] ; prov:qualifiedAttribution [ rdf:type prov:Attribution ; dcat:hadRole http://registry.it.csiro.au/def/isotc211/CI_RoleCode/custodian ; prov:agent https://ror.org/04ge02x20 ; ] ; samp:atStorageLocation [ rdf:type dct:Location ; dct:description "Bay 3a, shelf 7, position 28" ; ] ; samp:hasSize [ rdf:type qudt:Quantity ; qudt:hasQuantityKind quantitykind:Mass ; qudt:quantityValue [ rdf:type qudt:QuantityValue ; qudt:numericValue "0.654"^^dtype:numericUnion ; qudt:unit unit:KiloGM ; ] ; ] ; samp:material [ rdf:type skos:Concept ; skos:prefLabel "Rock" ; ] ; .

kcoyle commented 4 years ago

Thanks, @dr-shorthair I won't claim to have fully understood this but it appears to me that this is a graph that is a dcat:resource, but there is no dcat:catalog, no catalog record, no dataset and no distribution. Would most consider this a viable use of DCAT? If so, any descriptive metadata "record" could add dcat:resource, but I don't see the value. What am I missing?

dr-shorthair commented 4 years ago
my:Catalog-of-rock-specimens
    rdf:type dcat:Catalog ;
...
    dcterms:hasPart ga:r1985-MtIsa-Wyb-398 ;
    dcterms:hasPart ga:r1985-MtIsa-Wyb-399 ;
...
.
aidig commented 4 years ago

In terms of interoperability assets, ADMS-AP for Joinup consideres software solutions to be datasets. (reference to makxdekkers' comment on 4 Mar)

the Application Profile is used for the aggregation of information about interoperability assets (controlled vocabularies, metadata schemas) and software solutions by the federated repositories on the Joinup platform, online collections of interoperability solutions maintained by European public administrations, businesses and citizens.

Example: The software solution asset LEOS (Legislation Editing Open Software) is described as a dcat:Dataset: HTML on JoinUp - https://joinup.ec.europa.eu/solution/leos-open-source-software-editing-legislation TTL metadata - https://joinup.ec.europa.eu/rdf-export/rdf_entity/http_e_f_fec_ceuropa_ceu_fleos/turtle (snippet below)

<http://ec.europa.eu/leos>
  a dcat:Dataset ;
  dc:type <http://data.europa.eu/dr8/InteroperableEuropeanSolutionService>, 
          <http://data.europa.eu/dr8/TechnicalSpecification> ;
  dc:title "LEOS - Open Source software for editing legislation"@en ;  
  dc:description "LEOS Legislation Editing Open Software, is an open source software delivered under ISA² Action 2016.38 Legislation Interoperability Tools – LegIT
...

ADMS-AP for Joinup version 2.0 removes all use of Asset Description Metadata Schema for Software (admssw) which has been deprecated, and now solely seems to rely on ADMS for the description of Assets. Also, assets are classified using EIRA https://joinup.ec.europa.eu/svn/eia/taxonomy/EIRA_SKOS.rdf

However, prior to the ADMS revision, it was commented that using just one specification for datasets, standards and software might make the specification less understandable. Also, the number of relevant asset layers (for software releases) also appeared to present different implementation options. (See revision discussion: https://joinup.ec.europa.eu/solution/asset-description-metadata-schema-adms-revision)

smrgeoinfo commented 4 years ago

Kinds of resources should be differentiated based on what a USER needs to know about those resources.
For a dataset, I'd assert a user needs to know what the conceptual model and serialization scheme are for the data accessible on the WEB, as well as some provenance metadata to assess fitness for use. For software, a user needs to know what is necessary to execute the software -- what is the Operating system, interpreter, compiler, what are the interchange formats (conceptual model, serialization) for input data, what kind of output will be generated. Fitness for use is only constrained by what kind of input the software can use, and if the output is in a format (conceptual, schematic, syntactic) that my application can use.

These are obviously DIFFERENT, so dataset and software SHOULD have different resource types, and by implication, different content models for how they are described.

UNLESS all we're interested in is a software resource as a serialized byte stream that might be executable, but not interested in actually using it.

riccardoAlbertoni commented 4 years ago

@kcoyle wrote:

@riccardoAlbertoni I like your definition, although the last sentence still slips over into some data types that I think are a bit suspect. Can we find a way to say it without naming types that we aren't sure about? Datasets surely must be digital in nature (ones and zeroes). They also are "sets" ("collections of data"). Saying that may be sufficient.

The examples included so far were already included in DCAT 2 ( section 5.1, third item in the item list). That, of course, does not imply we cannot change them, I am saying this just to make clear it is not something we are adding now. Though I think I concur with your suggestions to rely on real examples and not being very imaginative, I think that here the examples are functional to convey that the notion of datasets in DCAT is broad and inclusive. Also, in this specific list, I do not see any items that I would consider an evident stretch. So from my side, I would be inclined to keep them, as they explicitly deliver the broadness and inclusivity of a DCAT dataset. Does this make any sense to you (@kcoyle)?

kcoyle commented 4 years ago

I'm ok with leaving it, but eventually I would like to see use cases for these - not just about creating the catalog but user services from the catalog. By that latter I mean: how do you expect these catalogs to serve users, and does that include users who are doing general online searching for materials. Thanks, @riccardoAlbertoni

riccardoAlbertoni commented 4 years ago

As discussed in tonight's meeting, we close the issue as the original question is addressed. Spin-off discussions are tracked in separate issues or implemented in the PR.