rcpch / rcpch-audit-engine

Epilepsy12 Audit Platform
https://e12.rcpch.ac.uk/
GNU Affero General Public License v3.0
5 stars 5 forks source link

Review of SNOMED terms handling within the E12 app #782

Open pacharanero opened 8 months ago

pacharanero commented 8 months ago

Issues we have with SCT handling as presently

Thoughts about how we would need to handle in future

This is just a placeholder for a longer discussion we need to have about SNOMED management - view welcome

wardle commented 5 months ago

I spotted this when looking at @eatyourpeas issue about simpler maintenance of a terminology server.

From what I can understand, you have an issue in which you need a list of codes that might be determined from multiple sources, including a manual list, as well as a list of codes derived from the combination of a SNOMED ECL expression (e.g. all concepts in a specific reference set would be "^xxxx" where xxx is the reference set).

The options for solving such an issue are therefore

  1. Include any manually defined codes into a SNOMED ECL expression.
  2. Generate a unified list of causes ahead of time
  3. Generate a unified list of causes whenever they are needed
  4. Request that the reference set has the missing items added via SNOMED International

The first option simply uses the fact that ECL can include fairly complex expressions and any terminology service when expanding that expression should remove duplicates.

For example, you might do <<24700007 OR ^ 1127821000000102

will add in multiple sclerosis, and all of its descendants into the expansion of this renal injury reference set (1127821000000102). This first option would work well if you only have a few manual additions to an already complete reference set. It will obviously work even if the reference set is updated. You can build ECL expressions by banging strings together, of course, so your manually created list of concepts could be read from a file. I would not recommend this option if you need to add very many additional concepts to a reference set. You can obviously build codelists from an expansion of multiple reference sets, or combine arbitrary rules such as limiting to diagnoses, or only problems without respiratory diagnoses, or whatever. image

Options 2 and 3 are also simple. I generally prefer to leave the composition of codelists to as late as possible - in essence this is a caching issue really - and I'd just use my programming language's ability to remove duplicates using set operations. I'm currently really enjoying using SQLite as part of some of my pipeline processes and automations to cache data - and that would similarly easily cope with being an intermediate data store that would gracefully handle the situation where a reference set is updated in a new SNOMED release such that you might end up with duplicates. In Clojure, I'd simply read a CSV file from disk containing my manually curated codes, and hit the terminology server (e.g. over HTTP, or in-process when I want to be simple) and perform a union set operation and then use, or write to disk / output to a db for later use. In that way, you're getting the benefit of a declarative approach, and simpler maintenance.

Options 1-3 assume you have SNOMED codes for any diagnoses not in the reference set. If there are genuinely missing codes, then you should definitely look at option (4) either to add the concept, or fix the synonyms linked to that concept. In the meantime, you'd then consider building an abstraction above SNOMED CT in which you have pairs of namespace and code, as per FHIR, so that you resolve against an internal namespace with your proprietary codes, and use snomed.info/sct as the namespace for conditions within SNOMED CT. You'd then need to add some kind of equivalence system to map from one to the other, in case SNOMED CT then added your missing codes. That can work, and is why equivalence, and mapping between code systems is an important first-class problem in informatics and won't go away any time soon. I do this in https://github.com/wardle/codelists which uses (declarative) definitions of codelists to generate codes from ICD-10, dm+d, and ATC etc. similar to Ben Goldacre's opencodelists but using rules rather than manually curated codes.

Option 4 is okay, but will take time, and need justification. I have raised a couple of issues in the past and the team have fixed so that process does work.

Hope that helps. Happy to chat if that is easier. Obviously all of the above could work with any terminology server - they should be interchangeable.

Mark

eatyourpeas commented 5 months ago

Thank you Mark this is a thoughtful summary and I am grateful to you for having put it together. Our use case is that users of this national audit define childhood epilepsy if they can in terms of its cause. Being paediatrics, SNOMED seems very slow to catch up, and the refset we have is incomplete and some of the causes our users are requesting to use (individual gene defects, metabolic diagnoses and so on) do not all seem to have SNOMED terms. That being so, we have created a separate sql table for the causes, and store them there, along with the conceptId and preferredTerm if it exists. My guess is that this authoritative list ultimately will be useful to SNOMED since it will represent a national dataset of real world epilepsy causes. But that does not really help us since we cannot really wait for SNOMED to catch up with our requirements. So our work flow currently is that as users ask for new terms to be added, we check them first against our Hermes server and add them to our table if they exist, and if they are not there, we add them manually. Most are in SNOMED but not all. So I guess that would be your option 2? Possibly we should chat as you clearly have a better handle on this than I do