w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
144 stars 55 forks source link

Are the dcat definitions and usage notes available in RDF? #1419

Closed simsong closed 2 years ago

simsong commented 2 years ago

Allow Definitions and Usage Notes to be read programmatically

Creator: @simsong

Deliverable(s): DCAT3 It would sure be useful if I could read the Definition: and Usage notes from the DCAT specification from within a python program using RDF, rather than having to web-scrape the HTML file. But I can't find where the information is located.

Use Case

I want to write a function that given “dcat:landingPage” returns the text string "If the distribution(s) are accessible only through a landing page (i.e. direct download URLs are not known), then the landing page link SHOULD be duplicated as dcat:accessURL on a distribution. (see § 5.7 Dataset available only behind some Web page)”

I want it read from this: https://www.w3.org/TR/vocab-dcat/

image

dr-shorthair commented 2 years ago

I want it read from this: https://www.w3.org/TR/vocab-dcat/

Sorry - that is the specification document. The RDF can be found from the namespace URI - http://www.w3.org/ns/dcat where you can find directions to the RDF artefacts. You can also go to the maintenance GitHub site, which is also mentioned in the spec document.

simsong commented 2 years ago

Thank you so much. So it is at https://www.w3.org/ns/dcat2.rdf which is linked from https://www.w3.org/ns/dcat. That's incredibly useful. Do you know where the similar documents for dcat-us are located?

dr-shorthair commented 2 years ago

What is dcat-us?

simsong commented 2 years ago

dcat-us is the version of DCAT that the US Government is using on https://data.gov. It is based on DCATv1. It is described here: https://resources.data.gov/resources/dcat-us/. It seems to have no namespaces. The HTML document was clearly automatically generated, but I cannot find the source documents or the source repo.

pwin commented 2 years ago

https://resources.data.gov/resources/podm-field-mapping/ has some links to the JSON schema and json-ld context, but I'm unable to find any further schema sources for the US version

On Wed, 3 Nov 2021, 01:08 Simson L. Garfinkel, @.***> wrote:

dcat-us is the version of DCAT that the US Government is using on https://data.gov. It is based on DCATv1. It is described here: https://resources.data.gov/resources/dcat-us/. It seems to have no namespaces. The HTML document was clearly automatically generated, but I cannot find the source documents or the source repo.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/dxwg/issues/1419#issuecomment-958581320, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIFYTASPWRK3QT3PA6RD63UKCKSZANCNFSM5HG7DNKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

kcoyle commented 2 years ago

Ditto to the above @simsong and @pwin. I also happened to be looking at dcat-us this week and was concerned with the lack of namespaces, so their vocab mixes dcat, dct and other properties without full identification. They claim to use json-ld but without namespaces I don't know how that would work. I did wander through their github repo and there is some dcat-us json code is at https://github.com/GSA/resources.data.gov/tree/main/pages/schemas/dcat-us/v1.1/schema but I haven't looked to see if this corresponds to the documentation. It does seem to be the same as displays from the page that Peter linked to. I'll report back if I do more work on it.

simsong commented 2 years ago

I am cleaning up dcat-us. You can find the work over here: https://github.com/usdhs/dcat-tool/blob/main/schemata/usg.ttl

I welcome suggestions, corrections, and pull requests. I'm just a beginner here.

makxdekkers commented 2 years ago

@simsong, I see in the ttl file that a range of rdfs:Literal is assigned to properties that are defined as owl:ObjectProperty. Now I am not an expert in OWL, but isn't it the case that owl:DataProperty is the kind of property that is meant to connect individuals with literals? There are for example properties like usg:hostingLocation and usg:datasetClassification with range rdfs:Literal, but should those not be things rather than strings?

simsong commented 2 years ago

@simsong, I see in the ttl file that a range of rdfs:Literal is assigned to properties that are defined as owl:ObjectProperty. Now I am not an expert in OWL, but isn't it the case that owl:DataProperty is the kind of property that is meant to connect individuals with literals? There are for example properties like usg:hostingLocation and usg:datasetClassification with range rdfs:Literal, but should those not be things rather than strings?

Hi. If you are referring to the file usg.ttl, my request is that we move the discussion to https://github.com/usdhs/dcat-tool/issues.

If you can make specific recommendations for what to change, that would be great! Again, I am a beginner at this.

With respect to usg:datasetClassification, unfortunately my current plan is for this to be a string that is parsed, rather than for a well-developed ontology. This is a result of operational concerns: it is not possible for me to document all of the possible values of this string. So this is a way of kicking the can down the road.