w3c / hcls-fhir-rdf

Sketching out an RDF representation for FHIR
39 stars 15 forks source link

Figure out a way to handle CURIES alongside IRI stems #127

Closed gaurav closed 3 months ago

gaurav commented 11 months ago

Our IRI prefix proposal assumes that concept IRI prefixes can be concatenated with the code -- for example, if system is http://loinc.org (IRI prefix: http://loinc.org/rdf/) and fhir.code is 55284-4, we can concatenate that into the concept IRI https://loinc.org/rdf/55284-4. However, this won't work for ontologies that are conventionally represented as CURIEs. For example, COVID-19 in MONDO is conventionally represented as MONDO:0100096, and we use the IRI prefix of http://purl.obolibrary.org/obo/MONDO_. If this is represented in FHIR as system = http://purl.obolibrary.org/obo/mondo.owl and code = MONDO:0100096, our current scheme would concatenate that into http://purl.obolibrary.org/obo/MONDO_MONDO:0100096, which is incorrect.

Some possible solutions:

  1. We require that all FHIR systems SHOULD use system = http://purl.obolibrary.org/obo/mondo.owl and code = 0100096 so that we can concatenate that into http://purl.obolibrary.org/obo/MONDO_0100096. However, we note that older systems might still use MONDO:0100096, and that consumers should be ready to treat the input as a CURIE and remove the prefix if spotted.
    • For legacy systems, we could build and deploy a "concept IRI generator" which has this logic built in, so you can send it the FHIR system/code pair and it will return the correct concept IRI whether or not a code or a CURIE is used.
  2. We require that all FHIR systems SHOULD use system = http://purl.obolibrary.org/obo/mondo.owl and code = MONDO:0100096 as per common usage. In order to construct the concept IRI, an automated process will need to take an extra, which would be one of:
    1. If the code looks like a CURIE, the prefix should automatically be removed.
    2. If the code looks like a CURIE and it starts with the final path component of the IRI stem (i.e. MONDO_) except with an _ instead of a colon as the final character, then the prefix should be removed.
  3. We require that all FHIR systems SHOULD use system = http://purl.obolibrary.org/obo/mondo.owl and code = MONDO:0100096 as per common usage, and we develop a new syntax for the prefix in order to indicate that a CURIE might be used here (e.g. http://purl.obolibrary.org/obo/MONDO_|MONDO:).
    • Alternatively, we could add an additional "CURIE prefix" value to the definition (for MONDO, this would be either MONDO or MONDO:), which could be used to remove the CURIE prefix is spotted.
  4. We require that all FHIR systems SHOULD use system = http://purl.obolibrary.org/obo/mondo.owl and code = MONDO_0100096, and then set the IRI prefix to http://purl.obolibrary.org/obo/.

We should share this discussion with the TIMS group so that we are all aligned on the best solution before presenting this to FHIR.

/cc @balhoff, @dbooth-boston, @ericprud

dbooth-boston commented 11 months ago

Discussed on 8/10/23: https://www.w3.org/2023/08/10-hcls-irc#T15-42-26

gaurav commented 10 months ago

Another data point to add to this conversation: https://www.wikidata.org/wiki/Property:P1554 (UBERON ID) does NOT use the UBERON: suffix, but https://www.wikidata.org/wiki/Property:P5270 (MONDO ID) does use the MONDO: suffix.

joeflack4 commented 8 months ago

TIMS group current relevance

This topic is coming back to relevance for the TIMS group presently as we are refining OMOP2FHIR-vocab.

Our present concern is broader: which to use to codify ontology-sourced concepts--URI, CURIE, or code--with various structures to consider. I like the idea of using FHIR coding (system+code) though.

About the proposals above

My concern with the 4 options in the OP is that system is proposed as http://purl.obolibrary.org/obo/mondo.owl, but that in order to get the concept URI, you'd have to first remove mondo.owl and add MONDO_ before adding the code. How does a user or system know to do that?

OMOP considerations

I don't know what you all think about this, but when our OMOP tool converts OMOP to FHIR, it creates 2 sets of output: split and combined. So for combined, there is a single "OMOP" CodeSystem created. For split, each vocabulary in OMOP gets its own CodeSystem. It's also possible for a user to download a subset of OMOP tables from Athena, and thus the combined artefact won't contain 100% of OMOP. There isn't really an open, authoritative, resolvable URI for these source "systems". OMOP terms do, however, have resolvable URIs. Example: https://athena.ohdsi.org/search-terms/terms/36249424

Other proposed options

Shall I continue with the numbering schema above?

  1. What if CodeSystem.url is the resolvable / authoritative system URI, e.g. http://purl.obolibrary.org/obo/mondo.owl? CodeSystem.concept.code can only be a code type (plain code or a CURIE would be valid here, but not a coding (system+code). But CodeSystem.concept.designation.use can be a coding (system+code). For that coding, is it perhaps alright if the system is the URI stem (e.g. http://purl.obolibrary.org/obo/MONDO_)? and the code can just be a code, not a CURIE.
  2. Instead of using coding (system+code), we user CURIEs, then then some FHIR extension element to help expand those CURIEs, like a curie_map? I think you guys might have discussed this and decided against, though.
gaurav commented 8 months ago

Thanks so much for the detailed response, @joeflack4!

My concern with the 4 options in the OP is that system is proposed as http://purl.obolibrary.org/obo/mondo.owl, but that in order to get the concept URI, you'd have to first remove mondo.owl and add MONDO_ before adding the code. How does a user or system know to do that?

Our goal is to add those mappings to terminology.hl7.org (THO). Our exemplar (and sole actual example) right now is LOINC -- if you look up the LOINC codesystem on THO, you'll see that one of the identifiers has type iri-stem and the value http://loinc.org/rdf/. This means that the IRI stem for LOINC codes is http://loinc.org/rdf/, and you can construct a concept IRI for LOINC codes by concatenating http://loinc.org/rdf/ with a LOINC code.

At some point (see #123) we'd like to add http://purl.obolibrary.org/obo/MONDO_ as the IRI stem for MONDO, so that a user can create concept IRIs from MONDO by concatenating that with MONDO values.

gaurav commented 8 months ago

We have a winner! Everybody on this call (https://www.w3.org/2023/10/12-hcls-irc#T15-45-43) agrees that:

We require that all FHIR systems SHOULD use system = http://purl.obolibrary.org/obo/mondo.owl and code = 0100096

Is the winner for our purposes.

joeflack4 commented 8 months ago

Ah, interesting. I've never seen a custom type, as used by the IRI stem above.

Note for OWL/FHIR converters, there's also our https://github.com/hot-ecosystem/owl-on-fhir, which is less minimalistic than https://github.com/aehrc/fhir-owl but/and is still in development. Like I can't even remember if the default is CURIE or URI right now. But I'm over 90% sure that I'm going to change the default/only behavior to what has just been decided in this thread.

gaurav commented 8 months ago

Tiago Lubiano wrote a nice blog post describing how Wikidata handles converting codes into IRIs -- not directly related to this conversation, but certainly interesting.

dbooth-boston commented 4 months ago

@gaurav Since we decided to recommend that prefixes not be a part of a terminology Coding.code, does that decision close this issue?

gaurav commented 3 months ago

Yes, I think it does! I was going to suggest documenting it somewhere in the FHIR spec, but #140 covers that, so I think we can go ahead and close this issue. Note that as per https://github.com/w3c/hcls-fhir-rdf/issues/127#issuecomment-1759884313 we will want to point out that other systems may use CURIEs (e.g. HP:0041088 on CSIRO Ontoserver), so although we're recommending bare codes consumers should accept CURIEs if they find them in FHIR.