iodepo / odis-arch

Development of the Ocean Data and Information System (ODIS) architecture
https://book.oceaninfohub.org/
27 stars 16 forks source link

report on comparing PDH record through Google DataSet Search vs ODIS #277

Open jmckenna opened 12 months ago

jmckenna commented 12 months ago

Author

@stanozr

Date

2023-06-13

Description

Taking a dataset found in OIH, coming from the Pacific Data Hub (PDH). Looking at the source (in the PDH), checking JSON-LD produced and the result in OIH.

In CKAN we use the DCAT extension: https://extensions.ckan.org/extension/dcat/#structured-data-and-google-dataset-search-indexing

Dataset

Title

RMI Updated Report on the Barbados Programme of Action (BPOA), 2004

Data Source

Pacific Data dataset URL

https://pacificdata.org/data/dataset/rmi-updated-report-on-the-barbados-programme-of-action-bpoa3e18de38- 4a91-4d92-80cb-550ba15a1179

Metadata (PDH)

Resource (download link in PDH)

https://pacific-data.sprep.org/system/files/RMI%2520update%2520report%2520on%2520BPOA_0.pdf

Notes

1) This is a document (dcat type = text), not structured data (dcat type = dataset 2) This “dataset” contains only one resource

JSON-LD

Full JSON-LD: https://pacificdata.org/data/dataset/rmi-updated-report-on-the-barbados-programme-of-action-bpoa3e18de38-4a91-4d92-80cb-550ba15a1179.jsonld

Here’s an extract of the JSON @graph object, cleaned up for readability (using nested values instead of @id nodes references, removed unused values, shortened descriptions):

{ "@type": "schema:Dataset",
  "schema:contentLocation": {
    "@type": "schema:place",
    "schema:addressLocality": "Marshall Islands",
  },"schema:addressRegion": "MH"
  "schema:dateModified": "2022-02-15T00:00:00",
  "schema:datePublished": "2021-06-25T00:00:00",
  "schema:description": "An updated report that presents [...]",
  "schema:distribution": {
    "@type": "schema:DataDownload",
    "schema:description": "Updated report that presents a brief description [...]",
    "schema:encodingFormat": "PDF",
    "schema:name": "RMI Updated Report on the  BPOA, 2004",
    "schema:url": "https://pacific-
data.sprep.org/system/files/RMI%2520update%2520report%2520on%2520BPOA_0.pdf"
  },
  "schema:includedInDataCatalog": {
    "@type": "schema:DataCatalog",
    "schema:description": "",
    "schema:name": "Pacific Data Hub",
  },"schema:url": "https://pacificdata.org"
  "schema:keywords": [
    "barbados-programme-of-action",
    "Environment",
    "sustainable-development",
    "Economic Development",
"sd",
    "bpoa",
  ],"Land Resources"
  "schema:license": "https://pacific-data.sprep.org/dataset/data-portal-license-
agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588",
  "schema:name": "RMI Updated Report on the Barbados Programme of Action (BPOA), 2004",
  "schema:publisher": {
    "@type": "schema:Organization",
    "schema:name": "Climate Change Directorate",
    "schema:contactPoint": {
      "@type": "schema:ContactPoint",
      "schema:contactType": "customer service",
      "schema:email": "xxx@xxx.net",
      "schema:name": "Climate Change Directorate",
  },} "schema:url": "https://pacificdata.org"
  "schema:url": "https://pacificdata.org/data/dataset/rmi-updated-report-on-the-barbados-
programme-of-action-bpoa3e18de38-4a91-4d92-80cb-550ba15a1179.jsonld"
}

Notes:

1) Properties are prefixed with “schema:” (valid but unnecessary) 2) Google datasets validates this, but complains that “url” properties should be named “contentUrl”

Framed JSON-LD

{ "@context": {
  },"@vocab": "http://schema.org/"
  "@id": "https://pacificdata.org/data/dataset/3e18de38-4a91-4d92-80cb-550ba15a1179",
  "@type": "Dataset",
  "dateModified": [
    "2022-02-15T00:00:00",
  ],"2022-02-15"
  "datePublished": [
    "2021-06-25T00:00:00",
  ],"2021-06-25"
  "description": "An updated report that presents [...]",
  "distribution": {
    "@id": "https://pacificdata.org/data/dataset/3e18de38-4a91-4d92-80cb-
550ba15a1179/resource/072f0a1b-cf39-4b45-b6a2-035107af4501",
    "@type": "DataDownload",
    "description": "Updated report that presents a brief description [...]",
    "encodingFormat": "PDF",
    "name": "RMI Updated Report on the  BPOA, 2004",
    "url": "https://pacific-
data.sprep.org/system/files/RMI%2520update%2520report%2520on%2520BPOA_0.pdf"
  },
  "includedInDataCatalog": {
    "@type": "DataCatalog",
    "description": "",
    "name": "Pacific Data Hub",
  },"url": "https://pacificdata.org"
  "keywords": [
    "sustainable-development",
    "bpoa",
    "sd",
  ],"barbados-programme-of-action"
  "license": "https://pacific-data.sprep.org/dataset/data-portal-license-
agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588",
  "name": "RMI Updated Report on the  Barbados Programme of Action (BPOA), 2004",
  "publisher": {
    "@type": "Organization",
    "contactPoint": {
      "@type": "ContactPoint",
      "contactType": "customer service",
      "email": "xxx@xxx.net",
      "name": "['Climate Change Directorate']",
    },"url": "https://pacificdata.org"
  },"name": "['Climate Change Directorate']"
  "url": "https://pacificdata.org/data/dataset/rmi-updated-report-on-the-barbados-programme-
of-action-bpoa3e18de38-4a91-4d92-80cb-550ba15a1179
}

Notes

1) Lost keywords on the way (topics beginning with uppercase leters)

Dataset in Google Datasets

Found in Google Datasets: https://datasetsearch.research.google.com/search?src=0&query=site%3Apacificdata.org%20BPOA%202004

Notes:

1) Resource identified as PDF 2) Countries properly identified (Marshall Islands) 3) Not much more information 4) No authors

Dataset in OIH

OIH Search Link

https://oceaninfohub.org/results/Dataset?search_text=BPOA+2004

Notes:

1) Region wrongly identified (Latin America) 2) Some keywords are missing (removed while framing JSON-LD) 3) Distribution: is the name of the resource (file) a. Link on distribution value is broken This is due to missing “contentUrl” property b. Value becomes long if dataset has many resources, e.g.: htps://oceaninfohub.org/results/Dataset?search_text=%22Fiji+Household+Income+and+Expendi ture+Survey+2008%22&region=Oceania 4) Ignored values: a. Publisher information b. Modified date c. Publication date 5) Temporal coverage is supported (see other example)

related to #81