nfdi4plants / ARCtrl

Library for management of Annotated Research Contexts (ARCs) using an in-memory representation and runtime-agnostic contract systems.
MIT License
14 stars 8 forks source link

[Collection] IRI Generation #368

Closed Freymaurer closed 4 months ago

Freymaurer commented 4 months ago

I think we should finally create a unified logic for uri generation.

The current logic, shown here might not be sufficient, to handle all different kinds of url.

Below i will try to summarize the requirements:

TS4TIB

An external ontology service in cooperation with DataPLANT:

quoting @Hannah-Doerpholz

So, I have created all purls for the terms that are currently in DPBO. The ontology repo now also has an automated workflow > that creates new purls whenever new DPBO terms are added to the .obo file. The purl checker + creation runs every Saturday > once per week, since it takes a while to run.

All ontologies are included except:

  • ARC
  • MIAPPE (our homebrew version)
  • CREDiT
  • NCBITaxon (both the full one as well as our homebrew one)

Everything else that we currently import through the ext_ontologies.include, as well as our DBPO is in the TIB.

I will close all related issues to track them here. @Hannah-Doerpholz please verify if this issue roughly sums up the requirements 🙂

Hannah-Doerpholz commented 4 months ago

The summary looks good to me! I'll see if we can't also add ARC, our MIAPPE and CREDiT into the TS4TIB. That would take a while though. I'll update you on any changes

Freymaurer commented 4 months ago

@Hannah-Doerpholz I just remembered, that MS term urls are also broken:

Example: https://ontobee.org/ontology/MS?iri=http://purl.obolibrary.org/obo/MS_1000031

Do we have any replacement for this?

HLWeil commented 4 months ago

I will implement a hardcoded logic to cover the cases where we know that the standard PURL is wrong and we have a functioning alternative.

Hannah-Doerpholz commented 4 months ago

@HLWeil Thank you! @Freymaurer I know they are broken, but that is not something we can resolve. I contacted the MS maintainers, OBO Foundry and Ontobee, since I didn't know where exactly the issue is. OBO Foundry says that the purls are fine and that the problem is likely with Ontobee, Ontobee says that the issue is probably with the ms.owl file, and the MS maintainers haven't responded at all.

A workaround for MS would be to not rely on the purls but link directly to OLS4, since the terms are displayed correctly there. That would mean the following:

old MS link: http://purl.obolibrary.org/obo/MS_1002809
new MS link: https://www.ebi.ac.uk/ols4/ontologies/ms/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMS_1002809
Hannah-Doerpholz commented 4 months ago

Another option might be to go through Bioregistry. Bioregistry is an identifier resolver. Here, there is always some RDF information about how the URI format for a term should look like. For example in ENVO:

http://purl.obolibrary.org/obo/ENVO_$1

Our imported ontologies that are in Bioregistry: ENVO, PSI-MS (Prefix MS), CHEBI, GO, OBI, PATO, PECO, PO (purls broken), RO (purls broken), TO, UO, PSI-MOD (prefix MOD), EFO, NCIT, OMP

Ontologies we host on GitHub ourselves that are in Bioregistry: CRO (the credit ontology), NCBITaxon

Ontologies we host on GitHub that are NOT in Bioregistry (I could add them though): DPBO ARC_v3.0 MIAPPE

For PO and RO, the workarounds as MS could be:

old PO link: http://purl.obolibrary.org/obo/PO_0007033
new PO link: https://www.ebi.ac.uk/ols4/ontologies/po/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPO_0007033

old RO link: http://purl.obolibrary.org/obo/RO_0002533
new RO link: https://www.ebi.ac.uk/ols4/ontologies/ro/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FRO_0002533?lang=en
HLWeil commented 4 months ago

Thanks a lot for your thorough input, @Hannah-Doerpholz!

So maybe we could use bioregistry in general?

E.g. instead of http://purl.obolibrary.org/obo/ENVO_09200010 use https://bioregistry.io/envo:09200010

And instead of http://www.ebi.ac.uk/efo/EFO_0005147 use https://bioregistry.io/efo:0005147

Would be kind of a practical unification.


Edit: Doesn't work for po though, as they link to PURL even though they have the correct, direct link to ols4 also listed... image

HLWeil commented 4 months ago

@Hannah-Doerpholz, I opened a PR with the changes, this took a quite a bit longer as other tests still used hard-coded "deprecated" URLs.

With this change, the namespaces discussed here should result in working URLs, but working towards unification would still be highly welcome. Especially the ms-style URLS are clunky and harder to parse.

If parsing demands change (e.g. if ontologies are added to bioregistry), feel free to reopen this issue and add the requirements.

Hannah-Doerpholz commented 4 months ago

Sorry about another question, but I noticed that in Swate the links are not adjusted. For example, "mass spectrometry" from MS the link is still a purl.obolibrary link (bottom left): Screenshot from 2024-06-17 15-35-20 The same also goes for DPBO. Is this currently the expected behaviour?

Freymaurer commented 4 months ago

Yes! We are working on the update for Swate with ARCtrl 2.0.0 integration. The expected release date (if not required earlier) is the 27.06.2024.