jhu-bids / TermHub

Web app and CLI tools for working with biomedical terminologies. https://github.com/orgs/jhu-bids/projects/9/views/7
https://bit.ly/termhub
GNU General Public License v3.0
11 stars 10 forks source link

Create csets: combo drugs, w/ include/exclude mono/combo filters #505

Open Sigfried opened 1 year ago

Sigfried commented 1 year ago

Was working with @stephanieshong on concept set 387143023 / Sulfonylureas (v4) which is a drug class including a bunch of ingredients that appear in combo drugs. Unfortunately (I don't know why) just including the class concept 21600749: Sulfonylureas would miss a bunch of appropriate concepts, so Stephanie included 19 other ancestor concepts as well:

select * from concepts_with_counts where concept_id in (select concept_id from concept_set_version_item where codeset_id = 387143023 and "isExcluded" is not true) order by 4,5,2;
┌────────────┬───────────────────────────────────────┬───────────┬──────────────────┬──────────────────────┬
│ concept_id │             concept_name              │ domain_id │  vocabulary_id   │   concept_class_id   │
├────────────┼───────────────────────────────────────┼───────────┼──────────────────┼──────────────────────┼
│   21600749 │ Sulfonylureas                         │ Drug      │ ATC              │ ATC 4th              │
│   21600767 │ metformin and sulfonylureas; systemic │ Drug      │ ATC              │ ATC 5th              │
│   21600766 │ phenformin and sulfonylureas; oral    │ Drug      │ ATC              │ ATC 5th              │
│   36214822 │ chlorpropamide Pill                   │ Drug      │ RxNorm           │ Clinical Dose Group  │
│    1594975 │ chlorpropamide 100 MG Oral Tablet     │ Drug      │ RxNorm           │ Clinical Drug        │
│   40030456 │ chlorpropamide Oral Tablet            │ Drug      │ RxNorm           │ Clinical Drug Form   │
│    1530014 │ acetohexamide                         │ Drug      │ RxNorm           │ Ingredient           │
│   19033498 │ carbutamide                           │ Drug      │ RxNorm           │ Ingredient           │
│    1594973 │ chlorpropamide                        │ Drug      │ RxNorm           │ Ingredient           │
│   19001409 │ glibornuride                          │ Drug      │ RxNorm           │ Ingredient           │
│   19059796 │ gliclazide                            │ Drug      │ RxNorm           │ Ingredient           │
│    1597756 │ glimepiride                           │ Drug      │ RxNorm           │ Ingredient           │
│    1560171 │ glipizide                             │ Drug      │ RxNorm           │ Ingredient           │
│   19097821 │ gliquidone                            │ Drug      │ RxNorm           │ Ingredient           │
│    1559684 │ glyburide                             │ Drug      │ RxNorm           │ Ingredient           │
│   19001441 │ glymidine                             │ Drug      │ RxNorm           │ Ingredient           │
│    1502809 │ tolazamide                            │ Drug      │ RxNorm           │ Ingredient           │
│    1502855 │ tolbutamide                           │ Drug      │ RxNorm           │ Ingredient           │
│   36027702 │ chlorpropamide / phenformin           │ Drug      │ RxNorm           │ Multiple Ingredients │
│   40798860 │ Glisoxepide                           │ Drug      │ RxNorm Extension │ Ingredient           │
└────────────┴───────────────────────────────────────┴───────────┴──────────────────┴──────────────────────┴

Which added an addition 1079 appropriate concepts, though only three of the ancestors(glyburide, glipizide, glimepiride) produced descendants (20) that had patient counts. But the addition of these 19 concepts and their descendants ended up bringing in a bunch of concepts that were metformin monotherapies and not Sulfonylureas. So then she excluded a number of concepts and their descendants, which then ended up excluding combo drugs that were Sulfonylureas.

Ideally, we would like to be able to:

Given some examples:

1. combo we want
   40164911 │ metformin hydrochloride 500 MG / repaglinide 2 MG Oral Tablet [PrandiMet]
   36890445 │ Glyburide 5 MG / Metformin 750 MG [Glucovance]

2. mono we do want
   1597756 │ glimepiride                           │ Drug      │ RxNorm           │ Ingredient
   1597761 │ glimepiride 1 MG Oral Tablet          │ Drug      │ RxNorm           │ Clinical Drug

3. mono we don't want
   40163926 │ 24 HR metformin hydrochloride 500 MG Extended Release Oral Tablet [Glucophage]

we asked: Can we write a query that would give us 1 and 2 but not 3? We tried to find ancestors that would help us get the ones we wanted and exclude the ones we didn't, but that was not easy. We still don't know if it's possible for this example, let alone more generally.

As an approach to excluding monotherapy of non-desired ingredients (in this case only metformin) we tried getting rid of any drug whose name contained 'metformin' but did not contain a slash. This worked satisfactorily (and it turned out only a few drugs needed to be excluded.)

select * 
from concept_ancestor_plus 
where ancestor_concept_id in (1502809,40030456,1597756,19033498,1559684,1594975,21600749,36027702,1594973,40798860,19097821,1502855,1560171,19001409,36214822,1530014,21600766,19059796,19001441) 
and lower(concept_name_2) like '%metformin%' and concept_name_2 not like '%/%';

Now we want to address two problems:

  1. Can we come up with a more general algorithm for getting all drugs with a desired ingredient (or any ingredient of a class) without including descendant concepts without those ingredients; and
  2. What features could we add to TermHub that could help with this process (with or without such an algorithm)?

TermHub's concept hierarchy was a big help, but we still had to do most of this work in SQL. Ideas? @hlehmann17? @DaveraGabriel? Others?

joeflack4 commented 1 year ago

I think that this is a very specific problem, and would take too much time to make a near priority for the TermHub UI.

This does look mostly like a SQL problem. Maybe I'm not understanding the problem fully, but might not the algorithm be as simple as?:

  1. Select i. mono-drugs with those ingredients and ii. combo-drugs with those ingredients iii. mono-drugs without the ingredients we want that got included as descendants of combo-drugs
  2. Filter out 'iii' where concept_relationship does not have any row for <mono_drug_concept_id>,RxNorm has ing,<target_ingredient_concept_id>
Sigfried commented 1 year ago

The problem described here has come up repeatedly over the past few months. One very costly solution is proposed in my ohdsi symposium submission a couple months ago. But that's something beyond single concept sets, so termhub can't implement it without greater changes elsewhere. Your solution would be great if we had a way of doing i, ii, and iii, but we don't. That's what the issue is about