Taking a dataset found in OIH, coming from the Pacific Data Hub (PDH).
Looking at the source (in the PDH), checking JSON-LD produced and the result in OIH.
Here’s an extract of the JSON @graph object, cleaned up for readability (using nested values instead of @id nodes references, removed unused values, shortened descriptions):
1) Properties are prefixed with “schema:” (valid but unnecessary)
2) Google datasets validates this, but complains that “url” properties should be named “contentUrl”
to be fixed in CKAN dcat extension’s profile definition.
3) Generated JSON-LD uses references to nodes. This is how the CKAN extension works (it seems).
1) Region wrongly identified (Latin America)
2) Some keywords are missing (removed while framing JSON-LD)
3) Distribution: is the name of the resource (file)
a. Link on distribution value is broken
This is due to missing “contentUrl” property
b. Value becomes long if dataset has many resources, e.g.: htps://oceaninfohub.org/results/Dataset?search_text=%22Fiji+Household+Income+and+Expendi ture+Survey+2008%22®ion=Oceania
4) Ignored values:
a. Publisher information
b. Modified date
c. Publication date
5) Temporal coverage is supported (see other example)
Author
@stanozr
Date
2023-06-13
Description
Taking a dataset found in OIH, coming from the Pacific Data Hub (PDH). Looking at the source (in the PDH), checking JSON-LD produced and the result in OIH.
In CKAN we use the DCAT extension: https://extensions.ckan.org/extension/dcat/#structured-data-and-google-dataset-search-indexing
Dataset
Title
RMI Updated Report on the Barbados Programme of Action (BPOA), 2004
Data Source
Pacific Data dataset URL
https://pacificdata.org/data/dataset/rmi-updated-report-on-the-barbados-programme-of-action-bpoa3e18de38- 4a91-4d92-80cb-550ba15a1179
Metadata (PDH)
Resource (download link in PDH)
https://pacific-data.sprep.org/system/files/RMI%2520update%2520report%2520on%2520BPOA_0.pdf
Notes
1) This is a document (dcat type = text), not structured data (dcat type = dataset 2) This “dataset” contains only one resource
JSON-LD
Full JSON-LD: https://pacificdata.org/data/dataset/rmi-updated-report-on-the-barbados-programme-of-action-bpoa3e18de38-4a91-4d92-80cb-550ba15a1179.jsonld
Here’s an extract of the JSON @graph object, cleaned up for readability (using nested values instead of @id nodes references, removed unused values, shortened descriptions):
Notes:
1) Properties are prefixed with “schema:” (valid but unnecessary) 2) Google datasets validates this, but complains that “url” properties should be named “contentUrl”
Framed JSON-LD
Notes
1) Lost keywords on the way (topics beginning with uppercase leters)
Dataset in Google Datasets
Found in Google Datasets: https://datasetsearch.research.google.com/search?src=0&query=site%3Apacificdata.org%20BPOA%202004
Notes:
1) Resource identified as PDF 2) Countries properly identified (Marshall Islands) 3) Not much more information 4) No authors
Dataset in OIH
OIH Search Link
https://oceaninfohub.org/results/Dataset?search_text=BPOA+2004
Notes:
1) Region wrongly identified (Latin America) 2) Some keywords are missing (removed while framing JSON-LD) 3) Distribution: is the name of the resource (file) a. Link on distribution value is broken This is due to missing “contentUrl” property b. Value becomes long if dataset has many resources, e.g.: htps://oceaninfohub.org/results/Dataset?search_text=%22Fiji+Household+Income+and+Expendi ture+Survey+2008%22®ion=Oceania 4) Ignored values: a. Publisher information b. Modified date c. Publication date 5) Temporal coverage is supported (see other example)
related to #81