plazi / treatmentBank

Repository devoted to house keeping of treatmentBank
0 stars 0 forks source link

Add link to Checklistbank and SIBiLS UI, as well from BLR #90

Open myrmoteras opened 1 year ago

myrmoteras commented 1 year ago

Treatments are reused in checklistbank and SIBiLS biodiversity PMC. From those instances there is a link to the source treatment.

Can we add a reciprocal link to those in the TB UI, and as well into our stats and export them to BLR?

ChecklistBank and SIBiLS use our treatmentUUID which might facilitate the linking - though it might need a control that the treatments is in CLB and SIBiLS.

Similarly, we could add the CLB and SIBiLS "PID".

This would also be in line with the coordination with Pensoft, EJT, Zoobank, as well as add another linking system to BiCIKL.

It would also be helpful to add the respective links to tb stats

image

(Source)

myrmoteras commented 1 year ago

@emilie19 might it be possible to get a URL to link to a specific treatment in SIBiLS? https://sibils.text-analytics.ch/search/?query=Rhinolophus&search_size=1000#results-section we can use to link?

mdoering commented 1 year ago

It will not only be comfortable for users, but also increase all our web/google search ranking

gsautter commented 1 year ago

Sure possible ... provided there is a way of getting our hands on this link, like, a way of performing a lookup with the Plazi UUID, DOI, or something similar ... just need a URL to query ... and if possible an approximate time frame after a new export when the ID in question becomes available.

mdoering commented 1 year ago

We can probably setup a resource to link to that just takes your treatment ID. Until that is in place you might be able to use our API to retrieve the datasetKey you need for linking to the webpage?

This API call looksup your treatment ID with Plazis GBIF Publisher key and should therefore always return a single or no result: https://api.checklistbank.org/nameusage/search?limit=2&publisherKey=7ce8aef0-9e92-11dc-8738-b8a03c50a862&usageID=03EC879FFF9AFFE8AB77AEDFFF4B17C8.taxon

This can then be used to construct the UI link which needs the datasetKey (=58039) and the taxon ID (your treatment ID): https://www.checklistbank.org/dataset/58039/taxon/03EC879FFFB1FFCEA875AF4BFDF21428.taxon

mdoering commented 1 year ago

Note the extra .taxon suffix which I strongly think we should remove

gsautter commented 1 year ago

We can probably setup a resource to link to that just takes your treatment ID. Until that is in place you might be able to use our API to retrieve the datasetKey you need for linking to the webpage?

This API call looksup your treatment ID with Plazis GBIF Publisher key and should therefore always return a single or no result: https://api.checklistbank.org/nameusage/search?limit=2&publisherKey=7ce8aef0-9e92-11dc-8738-b8a03c50a862&usageID=03EC879FFF9AFFE8AB77AEDFFF4B17C8.taxon

This can then be used to construct the UI link which needs the datasetKey (=58039) and the taxon ID (your treatment ID): https://www.checklistbank.org/dataset/58039/taxon/03EC879FFFB1FFCEA875AF4BFDF21428.taxon

@mdoering sure, this pattern is clear, and since CLB is reusing the taxon ID from the DwC-A, that's easy enough to generate, too ... would even fancy to do so on the fly instead of storing an actual property in the data ...

My question where to get the ID from was more directed towards SIBiLS, where I have no real knowledge of the identifiers at this point ...

gsautter commented 1 year ago

Note the extra .taxon suffix which I strongly think we should remove

@mdoering see my comment in https://github.com/CatalogueOfLife/checklistbank/issues/1227 ... easy enough to do at the technical level, but I'm afraid it might break things in all sorts of places, not at last the (association with the) record keys in GBIF ...

mdoering commented 1 year ago

you are right, the backbone needs an update. But that would be easy to do - I can simply remove the .taxon suffix in the postgres tables

myrmoteras commented 1 year ago

@gsautter Aren't uuid and uuid.taxon two different entities? UUID is the treatment, whilst UUID.taxon the taxonomic name in the nomenclature section, ie the equivalent to the taxonomic name in CLB?!

mdoering commented 1 year ago

A dwc:Taxon is a name usage which essentially is a treatment. Ideally even the "treatment citations" would which we treat as synonyms would cite the original treatment, i.e. usage id. The name is another entity with no identifier currently given in the plazi dwcas - which is fine.

In general it is also fine to have the same identifier value for different type of objects or in different datasets. You very often find 1 as the taxon, name and reference ID. Clearly there are systems like LOD where this is not desirable and you regulary turn these into globally unique URIs, e.g. bio.org/ref/1, bio.org/name/1 and bio.org/taxon/1. DwC archives do not have that assumption or requirement, so even the same UUID could be used for different things.

mdoering commented 1 year ago

Maybe we can also link from BLR to ChecklistBank in addition to GBIF? The dataset keys in CLB won't change once assigned. https://zenodo.org/record/4272771#.ZFs6Vy8Rr0o

gsautter commented 1 year ago

Maybe we can also link from BLR to ChecklistBank in addition to GBIF? The dataset keys in CLB won't change once assigned. https://zenodo.org/record/4272771#.ZFs6Vy8Rr0o

If the dataset key in CLB is the same as in GBIF, that should be rather straightforward ... if not, we need to find a way of getting the CLB dataset keys into the IMFs first, I'm afraid ...

mdoering commented 1 year ago

It is a different integer I am afraid. But we can do a lookup inside CLB and should also be able to expose the API and UI with a GBIF key, e.g. https://www.checklistbank.org/dataset/gbif-bfb878f3-8a74-46d3-a104-36485c32aaba/taxon/03EC879FFFB1FFCEA875AF4BFDF21428.taxon

gsautter commented 1 year ago

It is a different integer I am afraid. But we can do a lookup inside CLB and should also be able to expose the API and UI with a GBIF key, e.g. https://www.checklistbank.org/dataset/gbif-bfb878f3-8a74-46d3-a104-36485c32aaba/taxon/03EC879FFFB1FFCEA875AF4BFDF21428.taxon

From a technical point of view, that gbif- prefix should be easy enough to add ... however, it is kind of ugly, so getting the actual CLB issued dataset key would most likely be preferable ... is there a way of making a translation via the CLB API? Would only need the dataset key, as the taxon key proper is straightforward.

mdoering commented 1 year ago

The simplest would be to call https://api.checklistbank.org/dataset?gbifKey=bfb878f3-8a74-46d3-a104-36485c32aaba and retrieve the dataset key from there. It is static, so you can store it forever. Or the upcoming https://api.checklistbank.org/dataset/gbif-bfb878f3-8a74-46d3-a104-36485c32aaba call.

myrmoteras commented 1 year ago

@mdoering just for curiosity: why do I get a 404 in the main window using the link above? image

mdoering commented 1 year ago

Because that is not yet deployed. I only coded it this morning to enable a simple linking via gbif keys

gsautter commented 1 year ago

The simplest would be to call https://api.checklistbank.org/dataset?gbifKey=bfb878f3-8a74-46d3-a104-36485c32aaba and retrieve the dataset key from there. It is static, so you can store it forever. Or the upcoming https://api.checklistbank.org/dataset/gbif-bfb878f3-8a74-46d3-a104-36485c32aaba call.

OK, that's easy enough ... will we have to add some delay timer, though? Don't think this call will yield a useful result right after newly registering a DwCA to the GBIF API ... or should we even consider registering to CLB directly, basically rendering it an independent sibling of the GBIF uplink?

gsautter commented 1 year ago

I fee like this issue is related, as both concern uplink and sending notifications to CLB: https://github.com/plazi/community/issues/237

mdoering commented 1 year ago

We were discussing this in GBIF and it seems in the future we should make ChecklistBank the primary target for all checklist activity and thus to register checklist datasets there primarily. But that thinking hasn't really been finalised. For now the GBIF registry and CLB both track datasets with CLB doing a sync every 6h. We could increase that to 1h I suppose. And yes, you could also register a dataset directly with us after it has been registered with GBIF, so we know the GBIF UUID already and do not create duplicates during a registry sync. That would of course guarantee the immediate presence of all the plazi datasets.

gsautter commented 1 year ago

And yes, you could also register a dataset directly with us after it has been registered with GBIF, so we know the GBIF UUID already and do not create duplicates during a registry sync. That would of course guarantee the immediate presence of all the plazi datasets.

That should be possible, I guess ... and also would provide a way of getting the CLB dataset ID into TreatmentBank.

For now the GBIF registry and CLB both track datasets with CLB doing a sync every 6h. We could increase that to 1h I suppose.

For the very most datasets, this sounds like a bit of overkill, to be honest ... we might rather want to add a notification call telling CLB that a dataset was updated ... way more scalable than polling, and more timely as well.

myrmoteras commented 1 year ago

@gsautter where do we stand with add a link to the treatments in SIBiLS? I think we had discussions about this, but I can't find the notes.

see also https://github.com/plazi/treatmentBank/issues/90#issuecomment-1539613973

May be @emilie19 or Julien knows?

gsautter commented 1 year ago

We can add a link to the SiBILS search for the UUID, which then shows a list with a single document, which you have to expand yourself to look at it ... see https://sibils.text-analytics.ch/search/collections/plazi/C11B87BDFF80BF199994FF00FA307514 ... AFAIK, there is no direct link, and no way of displaying a treatment other than this search result page ... we can sure link to that if you think it looks good next to the treatments on Zenodo, TB, GBIF, etc.

myrmoteras commented 1 year ago

For now, that would be helpful, assuming you know, which treatments are in SIBiLS "biodivPMC".

At some point, SiBiLS need to think of assigning something like a persistent identifier, or at least a prefix that will persist. Such as https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197271/ whereby https://www.ncbi.nlm.nih.gov/pmc/articles/ would be the prefix, and the rest would just be the plazi UUID

https://sibils.text-analytics.ch/arciles/0B2B87C9FFDDFFE3FEE2C38EB681FE7B

instead of https://sibils.text-analytics.ch/search/collections/plazi/0B2B87C9FFDDFFE3FEE2C38EB681FE7B

gsautter commented 1 year ago

For now, that would be helpful, assuming you know, which treatments are in SIBiLS "biodivPMC".

Trouble is, we do not ... there is no dedicated ID to import back like e.g. on Zenodo, as they simply use ours.

At some point, SiBiLS need to think of assigning something like a persistent identifier, or at least a prefix that will persist. Such as https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197271/ whereby https://www.ncbi.nlm.nih.gov/pmc/articles/ would be the prefix, and the rest would just be the plazi UUID.

That would be a good thing, yes ... plus, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197271/ takes you to the page for the actual article, and you see right away that you're in the right spot ... the link to the SiBILS search takes you to a page that is so lavishly spaced that you have to scroll down to find what you linked to via a UUID in the search result area ...

https://sibils.text-analytics.ch/arciles/0B2B87C9FFDDFFE3FEE2C38EB681FE7B

This gives me a "404 - Not Found" ...

instead of https://sibils.text-analytics.ch/search/collections/plazi/0B2B87C9FFDDFFE3FEE2C38EB681FE7B

myrmoteras commented 1 year ago

From: Guido Sautter @.> Sent: Monday, June 5, 2023 2:20 PM To: plazi/treatmentBank @.> Cc: Donat Agosti @.>; Author @.> Subject: Re: [plazi/treatmentBank] Add link to Checklistbank and SIBiLS UI, as well from BLR (Issue #90)

EXTERNAL SENDER

For now, that would be helpful, assuming you know, which treatments are in SIBiLS "biodivPMC".

Trouble is, we do not ... there is no dedicated ID to import back like e.g. on Zenodo, as they simply use ours.

That means we do not know, what we exported to SIBiLS? Isn't there a gatekeeper that keeps about 50% of the reatments off from being exported to SIBILS? So you should have an idea?!

At some point, SiBiLS need to think of assigning something like a persistent identifier, or at least a prefix that will persist. Such as https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197271/ whereby https://www.ncbi.nlm.nih.gov/pmc/articles/ would be the prefix, and the rest would just be the plazi UUID.

That would be a good thing, yes ... plus, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197271/ takes you to the page for the actual article, and you see right away that you're in the right spot ... the link to the SiBILS search takes you to a page that is so lavishly spaced that you have to scroll down to find what you linked to via a UUID in the search result area ...

https://sibils.text-analytics.ch/arciles/0B2B87C9FFDDFFE3FEE2C38EB681FE7B

This gives me a "404 - Not Found" ...

Yes, because I made it up as example.

instead of https://sibils.text-analytics.ch/search/collections/plazi/0B2B87C9FFDDFFE3FEE2C38EB681FE7B

- Reply to this email directly, view it on GitHubhttps://github.com/plazi/treatmentBank/issues/90#issuecomment-1576682583, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABDFPJDAHVTQFZSUQTT32QDXJXFFPANCNFSM6AAAAAAX3AXOZA. You are receiving this because you authored the thread.Message ID: @.***>