Closed dbooth-boston closed 11 months ago
Note that fhir:Coding.system
must be a URI, while fhir:Coding.code
must be a code, which may include spaces (as per the spec). Looking through the examples, it looks like system URIs are not intended be used as prefixes. For example (from https://www.hl7.org/fhir/datatypes-examples.html#coding, but please point me to other examples to add here):
Vocabulary | fhir:Coding.system | fhir:Coding.code | Actual prefix | Actual URL |
---|---|---|---|---|
ICD-10 | http://hl7.org/fhir/sid/icd-10 | G44.1 | http://purl.bioontology.org/ontology/ICD10/ | http://purl.bioontology.org/ontology/ICD10/G44.1 |
SNOMED CT | http://snomed.info/sct | 128045006:{363698007=56459004} | http://purl.bioontology.org/ontology/SNOMEDCT/ | http://purl.bioontology.org/ontology/SNOMEDCT/128045006 |
(Note that I wasn't able to find an online service that includes the more complex SNOMED code used in the example above.)
Possible outcomes: AFAICT, this means that we can't use the fhir:Coding.system
as a prefix, and have to either:
fhir:Coding.system
and fhir:Coding.code
combination in order to return a concept IRI, orfhir:Coding.system
values to IRI prefixes, such that SNOMED CT (with a fhir:Coding.system
of http://snomed.info/sct
) will always be mapped to http://purl.bioontology.org/ontology/SNOMEDCT/
. This could then be concatenated with the fhir:Coding.code
(we would need to decide if spaces should be encoded as +
or %20
) to provide a full concept IRI. This could be stored in GitHub so that changes to it can be tracked, and hopefully could eventually be integrated into the FHIR specification.It might be useful to make a list of every Coding system used in the FHIR examples, however this list is not exhaustive.
Can we put this into http://registry.fhir.org/ somehow? @gaurav to investigate.
Some healthcare systems also have their own internal coding systems -- how do we handle that?
Harold and I had decided that we could put them in if we had a mapping for them (assuming the mappings were reasonable to code generically). This means we can map SNOMED-CT, LoINC, etc using pretty official URLs. Others, we could "host" in an HL7 namespace until the org behind them saw the value and said "gimme!" At that point, you have a bit of a prob 'cause you don't want to maintain utterly enormous tables of OWL:sameIndividualAs links. I suspect the answer there would be writing custom code for the platform stuck with obsolete URLs.
UML-S could provide some basis for a hosted namespace for un-Web-ified vocabs.
It might be useful to make a list of every Coding system used in the FHIR examples, however this list is not exhaustive.
I haven't had time to extract these yet, however, a list of system
URIs that can be used in FHIR Codings is available at https://build.fhir.org/terminologies-systems.html
Some additional code systems are listed on the FHIR Terminology Service at http://tx.fhir.org/r5/ and on the HL7 Terminology Service at https://terminology.hl7.org/codesystems.html
I have learned a few more things:
oid
, uuid
, uri
, other
. It would be pretty cool if this had a prefix
type as well!The CodeSystem resource declares the existence of a code system and its key properties including its preferred identifier. The NamingSystem resource identifies the existence of a code or identifier system, and its possible and preferred identifiers. The key difference between the resources is who creates and manages them - CodeSystem resources are managed by the owner or publisher of the code system, who can properly define the code system features and content. NamingSystem resources, on the other hand, are frequently defined by 3rd parties that encounter the code system in use, and need to describe the use, but do not have the authority to define the features and content. Additionally, there may be multiple authoritative NamingSystem resources for a code system, but ideally there would be only one authoritative CodeSystem resource (identified by its canonical URL) that is provided by the code system publisher, with multiple copies distributed on additional FHIR servers or elsewhere and used where needed.
system
code for an identifier or coding, which can be summarized as:
[ig-base-canonical]/CodeSystem/example-xxxxx
.http://hl7.org/fhir/sid/ndc
). However, this does NOT have information on potential prefixes.So, I think there are a series of potential solutions we can implement:
prefix
as an identifier type to NamingSystem.identifier.type
and fill in prefixes for the 255 naming systems currently published to terminology.hl7.org. We can then use the hl7-terminology
NPM package to read this information and fill in prefixes when given a system and code pair.hl7-terminology
to check for unmapped naming systems.Do you all think this would cover all our needs?
NamingSystem -> non-authoritative third-party annotation about a code system CodingSystem -> authoritative annotation by the publisher of a code system
Might want to have the prefix in CodingSystem -- there should only be one authoritative prefix/format for each coding system
CodingSystem URLs are based on hl7.org (e.g. http://hl7.org/fhir/sid/ndc), but the goal is probably to replace this with an authoritative URL when the resource wants to take over.
Gaurav to dig into CodeSystem to figure out where the prefix could go there.
The prefix could potentially go into the CodeSystem.identifier
, which is an Identifier
with both a IdentifierType
(named type
) and IdentifierUse
(named use
). We might consider prefix
as a potential value for use
. There is also a generic CodeSystem.property
field that we could use, but I think Identifier would be more specific.
So I think the next step is to write all of this up somewhere and then submit it to the FHIR writers to see what they think?
type
might be better to use here, since it is Extensible -- we can make up new types as needed.
I downloaded and executed the code in https://github.com/HL7/UTG using Java 11. It generated the HTML documentation you see at https://terminology.hl7.org/. In doing so, it appears to use both tx.fhir.org (“Connect to Terminology Server at http://tx.fhir.org”, “-tx: Connect to http://tx.fhir.org/r4”) and hl7-terminology (“Installing hl7.terminology#3.0.0 to the package cache”, which I haven’t figured out where that is). I'll open an issue at https://github.com/HL7/UTG to hopefully get to the bottom of this, and am hoping that other FHIRCat team members like @ericprud or @dksharma might know as well.
Once I figure out how to modify those CodeSystem/NamingSystem files, I'm planning to create a (forked?) repository with prefixes added to some of those files, and write a little demonstration tool that uses that information to convert FHIR codings into RDF concept URLs and vice versa.
In the meantime, I'm also writing up a more formal description of this issue and possible solutions. This might be useful later on if we do need to explain what we're doing to people outside our team. I'll set it to be view-only since I'm posting that URL publicly, but please do request editing rights to that document if you would like to help!
Current strategy:
Note that the fallback plan -- if HL7/FHIR refuse to put this into terminology.fhir.org -- would probably want to maintain this list separately.
Make sure that this works with US Core terminology: http://www.hl7.org/fhir/us/core/terminology.html -- they require specific URLs in that system, so we don't want to overwrite that or mess with it.
Here are eight candidates for coding system/naming systems mentioned in the FHIR R5 examples that we can provide prefixes for:
All of these have ten or more mentions in the FHIR R5 examples, so we could further check on resolvability by (for e.g.) looking up all the referenced codes to see if they work as expected.
@ericprud @balhoff You both have a lot more experience with RDF prefixes than I do, so if you see something I can do better here, please let me know!
Weekly update:
Next steps:
Tasks further down the line:
prefix
value for NamingSystem.uniqueId.type and CodeSystem.identifier (maybe https://github.com/HL7/fhir-ig-publisher/blob/master/org.hl7.fhir.publisher.core/src/main/java/org/hl7/fhir/igtools/publisher/Publisher.java?)Re: the SNOMED 128045006:{363698007=56459004}
compositional syntax, just URL-encoding it for now seems fine. But note that this is unneeded in FHIR, since you can express this in other ways. Also: it's good to push people towards prefixes rather than trying to do this in a more complicated way.
Do we need to canonicalize blank spaces/pipes/etc in the code value? Probably not -- we can leave them as is and leave it to downstream processing.
I've uploaded to Google Drive the lists of all system codes in R4/R5 (system-codes-r[45].tsv
) and the unique system/code pairs (unique-codes-r[45].tsv
). I'm trying to figure out some way to validate whether the IRIs being generated are correct -- for now, I'm trying to see whether those IRIs are resolvable (resolved-r[45].tsv
). For the FHIR JSON examples for R5, I got 370 unique system
values with a total of 1,968 unique system-code pairs, of which I could generate 789 concept IRIs using the five examples described above. Out of 789 IRIs I attempted to resolve, I got 671 successes (HTTP 200), 112 not found (HTTP 404), 3 server errors (HTTP 500) and 2 request timeouts. So it looks like this approach might be worth pursuing? Some of those 404s are IRIs that are not intended to resolve, so we might want to try resolving them against the OLS instead.
I'm going to pause the software development work here to finish writing up the problem discussion I was working on earlier so we can check to see if there's anything missing here.
v3-*
probably comes from http://www.hl7.org/implement/standards/rim.cfm -- look into adding that as prefix.I've updated the files (see Google Drive directory and resolved-r5 sheet) to include the display
field from the FHIR Examples.
I've writing up a brief summary of the problem and our proposed solution on Google Docs -- you can only comment on the document with that link, but please do request editor access if you'd like to help make it better and prepare it for submission to the FHIR chat! Before we submit it there, I'd love to link to it from https://github.com/HL7/UTG/issues/7 and ask Chris Mungall to have a look at it, as he might be interested in this as well.
As per our discussion last Thursday, I've asked chat.fhir.org for suggestions on sources of Coding.system/code pairs that are in use "in the wild": https://chat.fhir.org/#narrow/stream/179202-terminology/topic/Getting.20lists.20of.20CodeSystem.2FNamingSystems.20currently.20in.20use
Grahame suggested checking system/code pairs from Synthea, which is available as software code (https://github.com/synthetichealth/synthea) or synthetic data sets (https://synthea.mitre.org/fhir-api).
flatIRIStem
and hierarchicalIRIStem
as separate properties to indicate which algorithm we want people to usePutting IRI stems into the HL7 repo would only be adding identifiers to that repo, so it does not need to be R5 balloted. But we do need to change the spec for R5 to say that "if the concept IRI is known, then add it to the RDF".
On today's call we made two decisions:
Now that TSMG and the RDF subgroup have both voted on this, I think these are the next steps:
Done, though addition of some more IRI stems continues.
Individual concepts do not necessarily have canonical URIs to identify them. See example. Should we do something about that? Should we concatenate the fhir:Coding.system with the fhir:Coding.code in some way, to produce a canonical URI for the concept?