iodepo / odis-arch

Development of the Ocean Data and Information System (ODIS) architecture
https://book.odis.org/
29 stars 17 forks source link

connect BMDC catalogue as ODIS node #248

Open jmckenna opened 1 year ago

jmckenna commented 1 year ago
ndevilleBE commented 1 year ago

Hello Jeff, I had a quick look, it seems our issue can be solved quickly;

sitemap: https://metadata.naturalsciences.be/geonetwork/srv/api/sitemap

HTML json-ld embedded info (taken from the sitemap):

https://metadata.naturalsciences.be/geonetwork/srv/api/records/9f4131b6-7895-403a-a17e-6bb33befaf16?language=all

Does that work for you? Cheers

jmckenna commented 1 year ago

Thanks for the quick changes @ndevilleBE !

With a quick glance, the sitemap looks good, but the embedded JSON-LD gives an error in the validator, in the Distribution section, because of a missing value for the name property:

        {
        "@type":"DataDownload",
        "contentUrl":"https://www.marineatlas.be",
        "encodingFormat":"WWW:DOWNLOAD-1.0-http--download",
        "name": ,
        "description": "An HTTP link to download: MarineAtlas website"        }
        ,

I was testing with this record: https://metadata.naturalsciences.be/geonetwork/srv/api/records/9f4131b6-7895-403a-a17e-6bb33befaf16?language=all

jmckenna commented 1 year ago

Here is my second record test, which also has an error (it is set both as "@type": "schema:Dataset" and "@type": "schema:WebAPI") : https://metadata.naturalsciences.be/geonetwork/srv/api/records/mean_wave_direction_TS?language=all

Speaking with @fils, ODIS unfortunately rejects any JSON-LD that has an error.

Hmm.

jmckenna commented 1 year ago

(validator used: https://validator.schema.org/ )

ndevilleBE commented 1 year ago

https://metadata.naturalsciences.be/geonetwork/srv/api/records/mean_wave_direction_TS?language=all It does now pass the schema.org test. I'm checking the other issue. Do you have a way to test all metadata at once? Cheers

ndevilleBE commented 1 year ago

Hello Jeff, FYI: https://catalogue.odis.org/view/3271 Cheers

jmckenna commented 1 year ago

@ndevilleBE perfect, thanks!

ndevilleBE commented 1 year ago

Hello @jmckenna, We added the missing term "name" in the JSON-LD metadata information. It's in dev mode only for the time being as this is done with other improvements on our side. We will push it to production this week or the week after if no surprises occurs. Cheers

ndevilleBE commented 1 year ago

Hello @jmckenna Could you send me the metadata file with all the empty @id in the json-ld representation? I believe it is generated by Geonetwork but I need to verify it. Tanks,

jmckenna commented 1 year ago

Hey @ndevilleBE

In the meeting we were examining this record, I have pasted its JSON-LD below:

 {
  "@context": "http://schema.org/",
  "@type": "schema:Dataset",
  "@id": "https://metadata.naturalsciences.be/geonetwork/srv/api/records/bmdc.be:dataset:2721",
  "includedInDataCatalog": [
    {
      "url": "https://metadata.naturalsciences.be/geonetwork/srv/search#",
      "name": ""
    }
  ],
  "inLanguage": "eng",
  "name": "3D voxel model of the Belgian Continental Shelf",
  "dateCreated": [
    "2022-07-26T13:19:56Z"
  ],
  "dateModified": [
    "2022-07-28T12:05:11Z"
  ],
  "datePublished": [],
  "thumbnailUrl": [],
  "description": "Three-dimensional voxel model of the geological subsurface of the Belgian Continental Shelf containing information on probabilities of lithological classes (2: clay, 3: silt, 5: fine sand, 6: medium sand, 7: coarse sand and 8: gravel) and stratigraphy (1: Upper Holocene Nearshore, 2: Upper Holocene Offshore, 3: Lower Holocene, 4: Pleistocene and 5: Paleogene), estimated percentages of lithoclasses (clay, silt, mud, fine sand, medium sand, coarse sand, gravel and shells), and uncertainties (borehole density, entropie, positional quality, sampling quality and vintage).",
  "keywords": [
    "Geology",
    "Geo-Seas Udden-Wentworth scale",
    "Belgian part of the North Sea",
    "Belgian Exclusive Economic Zone"
  ],
  "author": [
    {
      "@id": "sumomdo@naturalsciences.be",
      "@type": "Organization",
      "name": "Royal Belgian Institute for Natural Sciences (RBINS), Directorate Natural Environment (OD Nature), Suspended Matter and Seabed Monitoring and Modelling (SUMO)",
      "email": "sumomdo@naturalsciences.be",
      "contactPoint": {
        "@type": "PostalAddress"
      }
    }
  ],
  "contributor": [],
  "creator": [],
  "provider": [
    {
      "@id": "bmdc@naturalsciences.be",
      "@type": "Organization",
      "name": "Royal Belgian Institute for Natural Sciences (RBINS), Directorate Natural Environment (OD Nature), Belgian Marine Data Centre (BMDC)",
      "email": "bmdc@naturalsciences.be",
      "contactPoint": {
        "@type": "PostalAddress",
        "addressCountry": "Belgium",
        "addressLocality": "Brussel",
        "postalCode": "1000",
        "streetAddress": "Vautierstraat 29"
      }
    },
    {
      "@id": "bmdc@naturalsciences.be",
      "@type": "Organization",
      "name": "Royal Belgian Institute for Natural Sciences (RBINS), Directorate Natural Environment (OD Nature), Belgian Marine Data Centre (BMDC)",
      "email": "bmdc@naturalsciences.be",
      "contactPoint": {
        "@type": "PostalAddress",
        "addressCountry": "Belgium",
        "addressLocality": "Brussel",
        "postalCode": "1000",
        "streetAddress": "Vautierstraat 29"
      }
    }
  ],
  "copyrightHolder": [],
  "user": [],
  "sourceOrganization": [],
  "publisher": [],
  "distribution": [
    {
      "@type": "DataDownload",
      "contentUrl": "https://www.bmdc.be/NODC/ditsAttach/datasource/7296/Belspo%20TILES_BE_20191014_ALL%20VARS.asc",
      "encodingFormat": "WWW:DOWNLOAD-1.0-http--download",
      "name": ,
      "description": "An HTTP link to download the dataset in CSV: 3D voxel model of the Belgian Continental Shelf (October 2019, all variables). BELSPO TILES Consortium"
    },
    {
      "@type": "DataDownload",
      "contentUrl": "https://www.bmdc.be/NODC/ditsAttach/datasource/7298/Belspo%20TILES_BE_DSS%20Export_2020.asc",
      "encodingFormat": "WWW:DOWNLOAD-1.0-http--download",
      "name": ,
      "description": "An HTTP link to download the dataset in CSV: 3D voxel model of the Belgian Continental Shelf (2020, Export decision support, main variables). BELSPO TILES Consortium"
    }
  ],
  "encodingFormat": [
    "text/csv"
  ],
  "spatialCoverage": [
    {
      "@type": "Place",
      "description": [],
      "geo": [
        {
          "@type": "GeoShape",
          "box": "50.89 1.31 52.09 3.68"
        }
      ]
    }
  ],
  "temporalCoverage": [
    "2018-01-01/"
  ],
  "license": [
    "https://creativecommons.org/publicdomain/zero/1.0/",
    {
      "@type": "CreativeWork",
      "name": "The data may be used and redistributed for free but is not intended for legal use, since it may contain inaccuracies. Neither the data Contributor, nor any of their employees or contractors, makes any warranty, express or implied, including warranties of merchantability and fitness for a particular purpose, or assumes any legal liability for the accuracy, completeness, or usefulness, of this information."
    },
    {
      "@type": "CreativeWork",
      "name": "No limitations on public access."
    }
  ]
}
ndevilleBE commented 1 year ago

We did check another dataset before. In this one I don't see empty @id which were an issue for your colleagues

jmckenna commented 1 year ago

To be honest, I can't find the exact record, however, I do see that 95 records have errors when ODIS tries to harvest these JSON-LD. I wonder if you/we can tackle removing the empty parameters first, when there is no value (name, description, copyrightHolder, user, sourceOrganization, publisher, etc) and then maybe I can easier find the missing @id record.

ndevilleBE commented 1 year ago

Ok no worries. I'll let you know when the new metadata version is updated without empty fields.

ndevilleBE commented 1 year ago

Good morning @jmckenna , As I mentioned to you I'll be absent for 2 months. To avoid blocking the ingestion of our metadata in your portal, I put you in contact with my colleague Thomas Vandenberghe (@tvandenberghe), who is managing the ISO XML metadata generation. He'll let you know when the new version is available with the corrections so you can run a test on everything again. Thanks

jmckenna commented 1 year ago

Thanks @ndevilleBE, will watch for updates from Thomas. Enjoy your break.

tvandenberghe commented 1 year ago

Hi @jmckenna. Our harvester is updated and now contains 'name'. At https://metadata.naturalsciences.be/geonetwork/srv/api/records/9f4131b6-7895-403a-a17e-6bb33befaf16 . It is just now that I see your full list of required fields, and these are still not there:

{
    "contributor": [],
    "copyrightHolder": [],
    "datePublished": [],
    "publisher": [],
    "sourceOrganization": [],
    "spatialCoverage": [
        {
            "@type": "Place",
            "description": [],
            "geo": [
                {
                    "@type": "GeoShape",
                    "box": "51.0937 2.292 51.527 3.27217"
                }
            ]
        }
    ],
    "user": []
}

I need to figure out how GeoNetwork populates these fields and what maps to them: 1) from GN system settings, or 2) original ISO XML? 3) hardcoded in the JSON-LD generation. We are planning on forking GN, which can help in figuring out 1) and 3)

tvandenberghe commented 1 year ago

Everyting happens in GeoNetwork, and we have primary control over the content of all fields.

https://github.com/geonetwork/core-geonetwork/blob/0ce9c6033da5545756c48aaff5f6e31d7f1ba5f2/schemas/iso19139/src/main/plugin/iso19139/formatter/jsonld/iso19139-to-jsonld.xsl#L20

"contributor": [],
-> gmd:identificationInfo//gmd:pointOfContact/[gmd:role/gmd:CI_RoleCode/@codeListValue='processor']
-> this depends a lot on the situation and may need changes at record level "copyrightHolder": [],
-> gmd:identificationInfo//gmd:pointOfContact/[gmd:role/gmd:CI_RoleCode/@codeListValue='owner']
-> RBINS "datePublished": [],
-> gmd:identificationInfo//gmd:citation//gmd:date[/gmd:dateType//@codeListValue='publication']//gmd:date//text()
-> add publication date as well "publisher": [],
-> gmd:identificationInfo//gmd:pointOfContact/[gmd:role/gmd:CI_RoleCode/@codeListValue='publisher']
-> RBINS "sourceOrganization": [],
-> gmd:identificationInfo//gmd:pointOfContact/[gmd:role/gmd:CI_RoleCode/@codeListValue='principalInvestigator']
-> similar to author now, but depends a lot on the situation and may need changes at record level "spatialCoverage/description": [], -> gmd:identificationInfo//gmd:extent/[gmd:geographicElement] foreach gmd:description[count(.//text() != '') > 0]
-> when empty, filled with 'Bounding box' "user": []
-> gmd:identificationInfo//gmd:pointOfContact/[gmd:role/gmd:CI_RoleCode/@codeListValue='user']
-> I don't see the point of us completing this. According to https://wiki.esipfed.org/ISO_19115-3_Codelists#CI_RoleCode it is someone who uses the resource. Isn't that you, ODIS?

I will make the necessary adaptations and get back to you.

jmckenna commented 1 year ago

@tvandenberghe thanks for your fix for the Distrubution name. I will try to re-index your endpoint into ODIS and report back.

Regarding additional properties (that you mention above), those are optional, and it is up to you whether to include/expose or not. I'd say wait for my report on our re-index first, before trying to add additional properties (it could open up a new 'can of worms'). More soon...

jmckenna commented 1 year ago

@tvandenberghe initial harvesting results of your endpoint can now be found here

bmdc-oih-initial

tvandenberghe commented 1 year ago

That's really cool. One issue is that https://spatial.naturalsciences.be/geoserver/idod/ows?version=1.3.0&service=WMS&request=GetCapabilities is rendered with formatted ampersands, making the url lead to nothing. Also, our url does not explicitly refer to a single layer but a getcapabilities description of a whole namespace: gmd:onLine/gmd:URL and the layer name is included is gmd:onLine/gmd:name. Would it be possible to render the distributioninfo as a complex object with name included. We likely won't be the only ones doing it like this (this way gives the cleanest rendering in GeoNetwork).

jmckenna commented 5 months ago

@tvandenberghe @ndevilleBE the issue with Distribution empty name is back, see this record :

  "distribution": [
        {
        "@type":"DataDownload",
        "contentUrl":"http://www.vliz.be/en/catalogue?module=ref&refid=41493",
        "encodingFormat":"WWW:LINK-1.0-http--link",
        "name": ,
        "description": "An HTTP link to view information on: Analyse van de levensgemeenschappen op het Belgisch continentaal plat: Studie van de epibenthale biocoenoses en van de demersale Pisces in en rondom de baggerzones . D. Maertens"        }
        ,

Can you take a look?