Open DarrenWHenderson opened 3 years ago
This does not perform well on our large observation_fact table. Part of the reason is that concept_dimension has 1.4m concepts in it, but observation_fact has only 144k! I ran into the same problem with totalnum, and solved it by paying the performance penalty to enumerate distinct concept_cds in obs_fact when creating the temp table: totalnum counter(lines 35-48).
But I'm trying that approach on your filter and it doesn't solve the performance problem.
This gets really tricky when you approach 1b facts. Any ideas??
Instead of distinct concepts, we could just pull in prefixes. Then try to filter obs fact with a like predicate.
Hard numbers: this takes 11 minutes on our db. My attempt at optimizing ran in 8 minutes, which doesn't seem nearly optimized enough. I like your prefix idea.
Try the code in this commit: https://github.com/DarrenWHenderson/loyalty_cohort/commit/1b05fb914b542d6cf747a69169cb8096df52ca9a
It flips the demographic look. Store just the paths/concept_cd we deem to be "demographics". Then rewrite the join in the visit step to be a left join from observation_fact to this temp table of demographic codes. It's snappy on our data. YMMV.
building your #cohort table took 11 minutes. I don't remember how long the old version took....
This approach had the least impact on performance, and we can drop in as many other paths as we want to exclude from the >= 2012 fact lookup in the visit pull in step 1.
The resolves https://github.com/i2b2plugins/loyalty_cohort/issues/21