wardle / hermes

A library and microservice implementing the health and care terminology SNOMED CT with support for cross-maps, inference, fast full-text search, autocompletion, compositional grammar and the expression constraint language.
Eclipse Public License 2.0
177 stars 20 forks source link

Support for other terminologies? #23

Closed sidharthramesh closed 3 years ago

sidharthramesh commented 3 years ago

I think SNOMED CT is probably the most complex terminology in the healthcare space. Is there any scope to coerce other terminologies like Loinc and ICD10 into the search engine?

PS: I have scripts that convert Loinc and ICD10 into SNOMED RF2 release files. They work with Hermes. However, having multiple endpoints would be nice. Maybe a uniform release file that Hermes can consume?

wardle commented 3 years ago

In short, no.

The long answer is that I truly think these should be independent services, but with the easy ability to compose them into a single seamless service if so desired.

That's what I've done with dm+d - I seamlessly navigate across code systems - fetching a drug using SNOMED, getting its ingredients and concrete data from dm+d and then using SNOMED on those data.

I have a private repo for LOINC - but only at an early stage - but it is on my work plan. ICD10 not so much because I only need crossmaps for that. I will let you know when I release the first version of "oink".

Why not a single engine? Because the import and processing and search is so easy - why couple things that are separate - build data and computing services that can be composed together but much easier to have a single domain in a single repository with a small easily understood code base.

wardle commented 3 years ago

I should add that previously I've written systems that combine everything and it is a pain for maintainability and testing.

I did implement genetic code systems at one point - eg OMIM - and so doing the same as a separate service is also on my work plan.

sidharthramesh commented 3 years ago

Okay! Thank you for the explanation.

wardle commented 3 years ago

Hi Sidharth. I can't see a situation where I'd want to search the text of ICD-10. Better to search SNOMED and map to ICD-10? What's your use case?

LOINC is different and provides value above and beyond what SNOMED can offer, but has its own idiosyncrasies - and it remains slightly unclear to me how best to provide search from a user point of view - if indeed search is needed rather than just being able to make sense of a LOINC code you happen to be processing. I've made a repo public (see oink) although that's at a very early stage with no CLI - only usable at the clojure REPL (- which is the best way to develop I might add).

Anyway, it can already give you the data for each LOINC number, give the component parts, in structured format, and give you the multi-axial ontology-like paths to root as well as crossmaps. I need to make accessible via a REST API for general use - and a command-line for index creation - but if you have any specific use cases, that would help me. My plan is to use this EAV store to populate an optimised Lucene index with the core components - and there exist what are essentially synonym tables in the distribution but they have lots of words in them so likely to get a lot of false positives from search unless handled carefully. Do you have a lot of experience with LOINC and what do you need?

sidharthramesh commented 3 years ago

Clinicians may not search for ICD-10 codes, but it's very common during billing and insurance. And even amongst clinicians, some specialities like Psychiatry talk about diseases in terms of ICD codes. Even while teaching and discussing among peers. So having the ability to search and enter ICD-10 codes is essential.

Regarding LOINC, searching on LOINC and getting the right codes for a test is actually pretty hard. The mapping work is done in the background and just getting the description for a LOINC code would fit most data entry scenarios. However, labs may sometimes have to map their internal codes with LONIC and that's probably where the search will be used. They could always use the LOINC website for that. LOINC fits the result of lab tests well, but for ordering a particular test or panel, I'd still use SNOMED CT.

In fact, I think most of the data entry needs of a clinician is covered by SNOMED CT (90%). ICD10 is required at billing and insurance and it needs to be searchable (10%). LOINC is mostly mapped in the background. From a clinical point of view, it is not required to build a search on top of it. Unless we're creating an application that is used by Labs to actually do the mapping - which is a one-time task. They can always use the Loinc website for that, but having the option to choose between LOINC search engines would be great too.

wardle commented 3 years ago

Thanks Sidharth - that is helpful. If the authoritative record is SNOMED CT - e.g. a recorded list of diagnoses - then I always planned to support a 'coding for billing' step in which the authoritative record is used to generate the ICD-10 codes. But you're right - perhaps some will find the need to enter ICD-10 codes, and my wider conceptual model is that we have software that will need to ingest coded information in a variety of coding systems and make sense of them.

As to LOINC, I agree with you - most of the use case is behind-the-scenes for laboratory tests - because LOINC seems to me to be a set of data used to build a local standard for that laboratory - a subset essentially. So you are reinforcing my previously held belief that search is provided more for internal lab users for configuration of their tests, but that the value of providing LOINC to more general software is to help them make sense of what that code means. That means two things - firstly, search can be provided and can simply look at the 6-component model of LOINC and the "relatednames" property - in order to help specific users understand and find things in LOINC - and secondly, that for clinical decision making, making the LOINC model available will aid understanding of that test result - even if that software doesn't have understanding built-in - e.g. oh that test is a trinucleotide repeat disorder test so I'll show it over there in the genetics section.

But it would be helpful to make the non-laboratory LOINC components - e.g. lists of specialties, document metadata etc. - searchable and understandable.

Thanks Sidharth - I'll make a simple proof-of-concept service available for LOINC initially and iterate as I learn more.

sidharthramesh commented 3 years ago

Also just an update, I've made a repository to automatically download SNOMED packages from the MLDS, which is used by the following countries for distributing SNOMED: India, Argentina, Belgium, Denmark, Estonia, Ireland, Malaysia, Netherlands, New Zeland, Norway, Sweden, Uruguay. It's similar to the work you've done for TRUD in the UK.

It automatically downloads the files, builds a docker image that imports and indexes the files and exposes Hermes inside the container. This can be run in the cloud using any of the container deployment products - Google Cloud Run, Azure Container Instances and AWS Fargate. I have a CI/CD pipeline set up with Google cloud build and it's very convenient! Versioning of local and international builds is as simple as creating a new tag.

I'm still using an old version of Hermes because I see that you've disabled exposing it to 0.0.0.0. How does it work in the latest version?

Also, the compact step fails on some CI/CD platforms like Github Actions and CircleCI because of how much memory it needs. Any idea on how to reduce memory consumption?

Everything else is super smooth!

wardle commented 3 years ago

Thanks Sidharth. There's a new bind-address option so you can simply configure at runtime. So there shouldn't be any difficulty in you upgrading. You'll find import dramatically faster.

Glad it is working for you - that's exactly how I intended it to be used. This idea of a national terminology server which is cared for and upgraded is old-fashioned - much better to switch your live service at the gateway level and even keep historic data services running if that's what you want.

Compaction is a function of the mapdb library. Other key value stores don't have a need for this, and certainly don't have the memory need. The options are a) uou should still be able to feed in a heap size to the running java executable, b) I could use an alternate key value store - e.g. I have used lmdb for the key value backend of the LOINC tooling as an experiment; the way I've written hermes it would be fairly easy to switch out to an alternative key value store, c) don't worry about compaction - it probably doesn't matter too much.

wardle commented 3 years ago

See this code for bind-address configuration.

wardle commented 3 years ago

PS your automatic download / build pipeline looks good.