i2b2plugins / loyalty_cohort

Uses heuristics to find patients who get the majority of their care from your hospital system
Other
1 stars 3 forks source link

Issue21 - Non-demographic concepts #22

Open DarrenWHenderson opened 3 years ago

DarrenWHenderson commented 3 years ago

This approach had the least impact on performance, and we can drop in as many other paths as we want to exclude from the >= 2012 fact lookup in the visit pull in step 1.

The resolves https://github.com/i2b2plugins/loyalty_cohort/issues/21

jklann commented 3 years ago

This does not perform well on our large observation_fact table. Part of the reason is that concept_dimension has 1.4m concepts in it, but observation_fact has only 144k! I ran into the same problem with totalnum, and solved it by paying the performance penalty to enumerate distinct concept_cds in obs_fact when creating the temp table: totalnum counter(lines 35-48).

But I'm trying that approach on your filter and it doesn't solve the performance problem.

This gets really tricky when you approach 1b facts. Any ideas??

DarrenWHenderson commented 3 years ago

Instead of distinct concepts, we could just pull in prefixes. Then try to filter obs fact with a like predicate.

jklann commented 3 years ago

Hard numbers: this takes 11 minutes on our db. My attempt at optimizing ran in 8 minutes, which doesn't seem nearly optimized enough. I like your prefix idea.

DarrenWHenderson commented 3 years ago

Try the code in this commit: https://github.com/DarrenWHenderson/loyalty_cohort/commit/1b05fb914b542d6cf747a69169cb8096df52ca9a

It flips the demographic look. Store just the paths/concept_cd we deem to be "demographics". Then rewrite the join in the visit step to be a left join from observation_fact to this temp table of demographic codes. It's snappy on our data. YMMV.

jklann commented 3 years ago

building your #cohort table took 11 minutes. I don't remember how long the old version took....