plazi / treatmentBank

Repository devoted to house keeping of treatmentBank
0 stars 0 forks source link

Link to ODIS via JSON-LD/schema.org #115

Open pbuttigieg opened 4 months ago

pbuttigieg commented 4 months ago

Following on from discussions at BICIKL and subsequent techical calls, we're exploring how to expose ocean-relevant content (including coastal zones) to the IOC-UNESCO Ocean Data and Information System (ODIS)

Once the initial material has been reviewed by the Traitbank team, we can have another technical call to go through a few examples and create some reference JSON-LD documents. Following that, and the creation of a sitemap registered in ODISCat, we can then begin testing the harvest and dissemination to the ODIS Federation.

This issue will be cross-linked to a counterpart on the odis-arch tracker

pbuttigieg commented 2 months ago

Pinging this issue - May is upon us

myrmoteras commented 2 months ago

@pbuttigieg let's plan in the week of May 22 to discuss. Can you send @gsautter some example input you would like to have so we can study and have an informed discussion?

pbuttigieg commented 1 month ago

@myrmoteras seems the project schedules didn't align, but ODIS is now sustained under UNESCO, so we have more flexibility

We've created a getting started guide that may be enough for your developers to create the initial link

https://book.oceaninfohub.org/gettingStarted.html

myrmoteras commented 1 month ago

@pbuttigieg @gsautter can we meet sometimes tomorrow ater 4pm or Friday afternoon after 3pm and discuss next steps?

gsautter commented 1 month ago

Hi Donat,

@pbuttigieg https://github.com/pbuttigieg @gsautter https://github.com/gsautter can we meet sometimes tomorrow ater 4pm or Friday afternoon after 3pm and discuss next steps?

sure, just tell me when, and whether to use Zoom or Skype ... preferring Skype personally, since the char is far better suited for sharing links and files, etc., specifically keeping them available and accessible after the meeting ends.

Best, Guido

myrmoteras commented 1 week ago

Hi Donat, Let’s formulate a standard answer to these sort of requests.

The questions regarding duplication is increasing. The answer is not that it doesn’t matter and that tools can de-douplicate, but that our GBIF record is different that it has a link to a treatment, and publications. With other words, this shows how the data is being used. At the same time there is a development to create a digital specimen that includes all the links to the various representation of a the physical or original observation.

In a sense, let me handle these sorts of requests, that is forward them to me and I will take care of it. OK, will do. I do understand the difference, but kind of failed to explain it that way ... sorry. I'll forward the next such request to you.

However, what's specifically strange about this one is that the specimen with the georeference is not the one the one the treatment is about ... the latter specimen is only mentioned in passing, of sorts. How should we mark something like this, in general? See http://tb.plazi.org/GgServer/html/03F0943BFFCF4D070BF8FDEBFA9C1675

All the best, Guido

From: Guido Sautter gsautter@gmail.com Sent: Tuesday, May 19, 2020 3:33 PM To: Horton, Tammy tammy.horton@noc.ac.uk Cc: Donat Agosti agosti@amnh.org Subject: Re: Plazi georef incorrect?

EXTERNAL SENDER

Hi Tammy, I’m writing to ask about the following entry in GBIF:

https://www.gbif.org/dataset/49a11228-6c4d-478f-b958-52610eaab951

Which is one of my papers. The entry includes only one georeferenced specimen – which is not correct. The paper details the locations of all the samples examined and provides a full station list and map. The georeferenced provided is for a mention of a specimen not covered in our paper! How do we correct this? well, the treatment does kind of cite that specimen, the location given as a range of coordinates. And as such, we marked it. In an automated process, there is pretty little we can do to tell whether or not a specimen cited complete with a pair of coordinates is the one the treatment actually refers to, all the more so if the treatment subject specimen comes without coordinates and other numeric detail data.

We can remove this materials citation if you want, but, treatment subject or not, it is a georeferenced specimen that making available as data is surely worthwhile.

I would also like to ask about duplication that may be occurring through these uploads. I am currently working to prepare datasets for OBIS/GBIF of specimens held in the Discovery Collections – what will happen if I upload the data on these specimens to OBIS? We will be creating duplicates. I’m sure you will have come across this for other museums, sharing specimen data to GBIF. Actually, you are the first author voicing concerns about duplication. I am not sure if GBIF does any duplicate removal or even reconciliation, especially in the absence of specimen codes, but reality is that very few authors make their specimens available as machine processable data, which is exactly why we extract said data from publications.

Also, I would not worry about avoiding duplicates all too much, considering that there are catalogs like WoRMS and ITIS that already have overlapping occurrence data and still are both individual datasets in GBIF. And ultimately, if the detail data of any two occurrence records match up exactly, consumer applications can eliminate the duplicates. Better to have occurrence data available to the public, if at the risk of duplication, than to have it locked in publications altogether.

Kind regards, Guido Sautter

myrmoteras commented 1 week ago

discussion from 20240709 Notes

gsautter commented 1 week ago

Pier Luigi Buttigieg 3:07 PM https://github.com/plazi/treatmentBank/issues/115 Pier Luigi Buttigieg 3:11 PM http://dashboard.oceaninfohub.org/ - you can find several implementations listed here. A panel below the main dashbard stats link to individual partners. Their sitemaps lead to their asset catalogues Pier Luigi Buttigieg 3:14 PM https://docs.google.com/document/d/1AcUmonEaI0AznzeSKC22fbYdhd4718It00Q1ugJl1Xs/edit?usp=sharing You 3:15 PM https://tb.plazi.org/GgServer/html/423BD1464079EE7844C5FE28FE592E10 https://tb.plazi.org/GgServer/xml/423BD1464079EE7844C5FE28FE592E10 Donat Agosti 3:19 PM https://zenodo.org/records/12698922 Donat Agosti 3:25 PM https://tb.plazi.org/GgServer/html/087487942A7D070F69E47C51D425FE78 Donat Agosti 3:31 PM https://tb.plazi.org/GgServer/html/087487942A7D070F69E47C51D425FE78 Donat Agosti 3:39 PM https://tb.plazi.org/GgServer/html//077587FDB8446427FF6CD93EED3FF8E5 Donat Agosti 3:51 PM https://synospecies.plazi.org/#Tyrannosaurus+rex Pier Luigi Buttigieg 4:00 PM https://obis.org/dataset/95ba197f-f043-4980-ab3a-e4b3ad59b578 https://validator.schema.org/#url=https%3A%2F%2Fobis.org%2Fdataset%2F95ba197f-f043-4980-ab3a-e4b3ad59b578

pbuttigieg commented 2 days ago

@gsautter @myrmoteras

Here's a an initial template - based on this PR https://github.com/iodepo/odis-in/pull/25 - for Treatments as Datasets, based on one of your examples provided.

Note that the schema:citation property is used heavily. This property is used to reference other CreativeWorks that are related to the one being described. The material, treatment, and article citations (including figures, tables, etc) can all go in there as typed nodes, which should allow you to add the metadata you need.

Note I left some additional properties in there just in case those are useful for other treatments. You can of course delete these if not relevant.

I think this is enough to begin sharing TreatmentBank's content via ODIS and related systems. If you have any questions please ping me in https://github.com/iodepo/odis-in/pull/25.

pbuttigieg commented 2 days ago

@myrmoteras @gsautter

If you wish to be more literal about taxonomic concepts as claims, you can use this pattern to associate the Claim that a taxon is present with Treatments wherein the claim appears.

If each treatment's JSON-LD has its own @id (a URL pointing to the JSON-LD file describing the treatment), then you can reference the treatments each claim refers to by just pointing to the @id:

       "appearance": [
        {
          "@id": "https://treatment.plazi.org/json-ld/url-to-a-json-ld-representation-of-some-treatments-metadata/"
        },
        ....
]

if the JSON-LD files for Treatments don't have their own URLs/IRIs, then you could include metadata about them verbatim:

       "appearance": [
       {
          "@type": "Dataset",
          "@id": "URL:  Optional. A URL that resolves to *this* JSON-LD document, NOT the URL of the Dataset that this JSON-LD document describes. To link to the Dataset itself, please use 'url' and/or 'identifier')",
          "name": "Maera gujaratensis, Thacker & Myers & Trivedi, 2024",
          "description": "A dataset representing a TreamentBank Treatment record of Maera gujaratensis, see https://plazi.org/treatmentbank/what-treatment/",
          "url": "https://treatment.plazi.org/id/423BD146-4079-EE78-44C5-FE28FE592E10",
          "identifier": "423BD146-4079-EE78-44C5-FE28FE592E10"
        },
        {
          "@type": "Dataset",
          "@id": "URL:  Optional. A URL that resolves to *this* JSON-LD document, NOT the URL of the Dataset that this JSON-LD document describes. To link to the Dataset itself, please use 'url' and/or 'identifier')",
          "name": "Maera gujaratensis, Smith & Jones & Li, 2024",
          "description": "A dummy dataset representing another TreamentBank Treatment record of Maera gujaratensis, see https://plazi.org/treatmentbank/what-treatment/",
          "url": "https://treatment.plazi.org/id/MADE-UP-423BD146-4079-EE78-44C5-FE28FE592E10",
          "identifier": "MADE-UP-423BD146-4079-EE78-44C5-FE28FE592E10"
        }
        ....
]