w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
150 stars 47 forks source link

DCAT - schema.org alignment (profile) #251

Closed dr-shorthair closed 5 years ago

dr-shorthair commented 6 years ago

Prepare a formal alignment of DCAT with schema.org

Also see #65

dr-shorthair commented 6 years ago

This is a separate issue to prepare the alignment.

65 concerns guidance on use of DCAT in weakly axiomatized environment, which probably has a dependency on the alignment/mapping, but is a distinct concern.

Relevant commentary copied over from #65

@andrea-perego commented on 20 Jan • Some work has been carried out in the framework of SDW that can turn to be relevant here:

https://www.w3.org/2015/spatial/wiki/ISO_19115_-_DCAT_-_Schema.org_mapping

The specified mappings were based on existing work, including a mapping of DCAT-AP, GeoDCAT-AP, and StatDCAT-AP to Schema.org:

https://ec-jrc.github.io/dcat-ap-to-schema-org/

@makxdekkers commented on 7 Feb This would be very useful for tools that could create the relevant schema.org markup for a landing page for the dataset based on the DCAT description.

@andrea-perego commented on 8 Feb • This would be very useful for tools that could create the relevant schema.org markup for a landing page for the dataset based on the DCAT description.

The DCAT-AP to Schema.org mapping exercise I mentioned earlier in this thread includes also SPARQL queries implementing such mappings:

https://github.com/ec-jrc/dcat-ap-to-schema-org/tree/master/sparql

@dr-shorthair commented on 1 Mar I propose adding an RDF alignment document for DCAT-schema.org to enable this discussion

@dr-shorthair commented on 7 Mar More mappings from SDWWG and Project Open Data added in #146

@dr-shorthair commented on 23 Mar Preliminary DCAT-schema.org alignment here: https://github.com/w3c/dxwg/blob/gh-pages/dcat/rdf/schema.ttl

dr-shorthair commented 6 years ago

RDF file renamed to dcat-schema.ttl Work underway in branch /dcat-alignments-schema/ - see https://github.com/w3c/dxwg/blob/dcat-alignments-schema/dcat/rdf/dcat-schema.ttl

dr-shorthair commented 6 years ago
DCAT element mappings to schema.org
dc:description owl:equivalentProperty schema:description
dc:title owl:equivalentProperty schema:name
dct:description owl:equivalentProperty schema:description
dct:format [ owl:inverseOf rdfs:subPropertyOf ] schema:encodingFormat
dct:identifier owl:equivalentProperty schema:identifier
dct:issued owl:equivalentProperty schema:datePublished
dct:language owl:equivalentProperty schema:inLanguage
dct:license owl:equivalentProperty schema:license
dct:modified owl:equivalentProperty schema:dateModified
dct:publisher owl:equivalentProperty schema:publisher
dct:spatial owl:equivalentProperty schema:spatial
owl:equivalentProperty schema:spatialCoverage
dct:temporal owl:equivalentProperty schema:datasetTimeInterval
owl:equivalentProperty schema:temporal
owl:equivalentProperty schema:temporalCoverage
dct:title owl:equivalentProperty schema:name
dct:type owl:equivalentProperty schema:additionalType
dcat:Catalog owl:equivalentClass schema:DataCatalog
dcat:DataService owl:equivalentClass schema:DataFeed
Not quite sure if a DataFeed is a data service, or a data collection. From a REST viewpoint there is no difference. But other APIs support additional queries, slices, etc which make the characterization of a service more efficient than listing all the (potentially infinite) resources available from it.
dcat:Dataset owl:equivalentClass schema:Dataset
dcat:Distribution owl:equivalentClass schema:DataDownload
dcat:Resource rdfs:subClassOf schema:Thing
dcat:accessURL rdfs:subPropertyOf schema:contentUrl
schema:domainIncludes dcat:Distribution , schema:DataDownload
schema:rangeIncludes rdfs:Resource , schema:URL
dcat:byteSize rdfs:subPropertyOf schema:contentSize
schema:domainIncludes dcat:Distribution , schema:DataDownload
schema:rangeIncludes rdfs:Literal , schema:Text
dcat:catalog schema:domainIncludes dcat:Catalog , schema:DataCatalog
schema:rangeIncludes dcat:Catalog , schema:DataCatalog
dcat:contactPoint owl:equivalentProperty schema:contactPoint
schema:domainIncludes , dcat:Dataset , dcat:DataService , schema:Dataset
dcat:dataset owl:equivalentProperty schema:dataset
schema:domainIncludes dcat:Catalog , schema:DataCatalog
schema:rangeIncludes dcat:Dataset , schema:Dataset
dcat:distribution owl:equivalentProperty schema:distribution
schema:domainIncludes dcat:Dataset , schema:Dataset
schema:rangeIncludes dcat:Distribution , schema:DataDownload
dcat:downloadURL rdfs:subPropertyOf schema:contentUrl
schema:domainIncludes dcat:Distribution , schema:DataDownload
schema:rangeIncludes rdfs:Resource , schema:Thing
dcat:keyword rdfs:subPropertyOf schema:keywords
dcat:keyword is singular, schema:keywords is plural
schema:domainIncludes dcat:Resource , dcat:Dataset , dcat:DataService , schema:Dataset
schema:rangeIncludes rdfs:Literal , schema:Text
dcat:landingPage rdfs:subPropertyOf schema:url
schema:domainIncludes dcat:Resource , dcat:Dataset , dcat:DataService , schema:Dataset
schema:rangeIncludes foaf:Document , schema:WebPage
dcat:mediaType owl:equivalentProperty schema:encodingFormat
schema:domainIncludes dcat:Distribution , schema:DataDownload
schema:rangeIncludes dct:MediaTypeOrExtent
dcat:record schema:domainIncludes dcat:Catalog , schema:DataCatalog
schema:rangeIncludes dcat:CatalogRecord
dcat:service schema:domainIncludes dcat:Catalog , schema:DataCatalog
schema:rangeIncludes dcat:DataService
dcat:theme owl:equivalentProperty schema:about
schema:domainIncludes dcat:dcat:Resource , dcat:Dataset , dcat:DataService , schema:Dataset
schema:rangeIncludes skos:Concept , schema:Class
dcat:themeTaxonomy schema:domainIncludes dcat:Catalog , schema:DataCatalog
schema:rangeIncludes skos:ConceptScheme
foaf:Organization owl:equivalentClass schema:Organization
foaf:homepage owl:equivalentProperty schema:url
foaf:mbox owl:equivalentProperty schema:email
and potentially of interest to be mixed-in for scientific data descriptions
sosa:hasFeatureOfInterest rdfs:subPropertyOf schema:object
sosa:hasResult rdfs:subPropertyOf schema:result
sosa:madeByActuator rdfs:subPropertyOf schema:instrument
sosa:madeBySampler rdfs:subPropertyOf schema:instrument
sosa:madeBySensor rdfs:subPropertyOf schema:instrument
sosa:observedProperty owl:equivalentProperty schema:variableMeasured
sosa:phenomenonTime rdfs:subPropertyOf schema:temporalCoverage
sosa:resultTime rdfs:subPropertyOf schema:endTime
sosa:usedProcedure owl:equivalentProperty schema:measurementTechnique

Some items struck through because they are marked 'superseded' in schema.org.

dr-shorthair commented 6 years ago

Given the interest in Google Dataset Search, should we put this into 2PWD to trigger feedback?

PeterParslow commented 6 years ago

I'm pleased you're looking at this. Could you supersede https://www.w3.org/2015/spatial/wiki/ISO_19115_-_DCAT_-_Schema.org_mapping when this is mature enough? Perhaps just putting something on that wiki draft to point to this issue.

Because it's that older wiki draft which comes up top when I search for "DCAT vs Schema.org"

larsgsvensson commented 6 years ago

The aligments dct:format owl:equivalentProperty schema:encodingFormat and dcat:mediaType owl:equivalentProperty schema:encodingFormat imply dct:format owl:equivalentProperty dcat:mediaType if owl:equivalentProperty is transitive.

danbri commented 6 years ago

my notes from yesterday towards a mapping tested by applying it to turning Google Dataset Search documentation into DCAT RDF is in https://twitter.com/danbri/status/1055787656318214144

The python notebook runs a SPARQL CONTEXT query, based on consulting @dr-shorthair 's term mappings:

PREFIX s: <http://schema.org/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>

CONSTRUCT { 
  ?d a dcat:Dataset ; dct:title ?t ; dct:description ?desc .
  ?d dcat:distribution [a dcat:Distribution ; dct:format ?ef ] .
  ?d dct:license ?lic .
  ?d dct:temporal ?tempC .
} WHERE { 
  ?d a s:Dataset; s:name ?t ; s:description ?desc . 
  ?d s:distribution [ s:encodingFormat ?ef ] .
  OPTIONAL { 
  ?d s:license ?lic . 
  }
  OPTIONAL {
  ?d s:temporalCoverage ?tempC .  
  }
}

It would be interesting to get at least one such query blessed by the WG. One motivation for doing so would be to communicate to Google what the WG thinks a DCAT version of a particular example ought to look like.

This is in the opposite direction to https://ec-jrc.github.io/dcat-ap-to-schema-org/#formal-definition-sparql-mapping-properties-dataset but the same basic approach.

kcoyle commented 6 years ago

Following on @larsgsvensson comment (and maybe this should be in its own github issue) all of the one-to-many equivalencies are implying that the "many" have the same semantics, which probably is not the case or schema.org would not have different properties. One could look at using subproperties here if that fits. @dr-shorthair

dct:spatial owl:equivalentProperty schema:spatial
  owl:equivalentProperty schema:spatialCoverage
dct:temporal owl:equivalentProperty schema:datasetTimeInterval
  owl:equivalentProperty schema:temporal
  owl:equivalentProperty schema:temporalCoverage

p.s. this has given me renewed respect for the library practice of "use for" between terms, as this to me is closer to what is meant with a vocabulary cross-walk. It doesn't mean that the terms are semantically equivalent, it just means that you substitute one for the other even if it isn't exact in meaning. Cross-walks usually are semantically lossy to some degree.

kcoyle commented 6 years ago

I have another question about the 1-to-many entries. Wouldn't they only work in the direction of many-to-one? You can translate schema:datasetTimeInterval and schema:temporal (etc.) to dct:temporal, but if you are going in the direction from DCAT to schema.org, which of the three schema.org properties would you translate dct:temporal to?

It seems that if the two vocabularies in a cross-walk are not symmetrical then you actually need two cross-walks (or some clever use of the table format) to show both directions. You may decide that you don't need both directions, but in that case the one-to-many properties should be reduced to one-to-one from DCAT to schema.org.

dr-shorthair commented 5 years ago

Thanks @kcoyle -

Concerning the one-to-many relations - schema:spatial is supersededBy schema:spatialCoverage, and both schema:temporal and schema:datasetTimeInterval by schema:temporalCoverage so the older forms are not included in the document, only in the issue thread here.

dr-shorthair commented 5 years ago

I've fixed the incorrect equivalent class axioms, and relaxed some of the stronger alignments from equivalent to skos:closeMatch - see https://github.com/w3c/dxwg/pull/609

davebrowning commented 5 years ago

It looks to me that we have completed this, at least for DCAT2 - or is there something more to be done? (It could always be re-visited....) Looks like this would be better in the 'DCAT Alignment' milestone?

dr-shorthair commented 5 years ago

Indeed. Closing.