Optimization for #<code_type>_baseline tables for cohort flags

Starting at line 159-239: https://github.com/i2b2plugins/loyalty_cohort/blob/8bbd4d2e542db4040806fb5a357014cf19f6f996/usp_LoyaltyCohort_v6.sql#L159

Three temp tables are extracted from the facts called #dx_baseline, #px_baseline, and #lab_baseline. These are later utilized to produce a number of cohort flags starting with MEDVisit_pname2 at line 379: https://github.com/i2b2plugins/loyalty_cohort/blob/8bbd4d2e542db4040806fb5a357014cf19f6f996/usp_LoyaltyCohort_v6.sql#L379

4 Update statements then process the remainder of the flags from the baseline temp tables into #cohort ending around line 449.

I propose the following refactorization of all these processes described, such that after building the #cohort table, these flags can be calculated and applied in one transaction with the below. All domains are extracted in one trip to the underlying fact table, the MDVisit_pname variables are calculated first at the required provider_id grain, and then the aggregate is adjusted to the patient_num level. The other DX/PX/Lab feature names can all be processed simultaneously. I could find no other reference in the procedure to these baseline tables beyond this flag calculation that require them to be stored on disk. This optimization, when combined with the #cohort optimization I offered in https://github.com/i2b2plugins/loyalty_cohort/issues/6 in our environment, brings the bulk of the processing steps (cohort+flags) down to sub-30s. I would be interested to see if other sites with much larger warehouses see any gains in this approach.

;WITH CTE_PARAMS AS (
select distinct Feature_name, concept_cd, 'DX' as CodeType --[ACT_PATH], 
from [dbo].[xref_LoyaltyCode_paths] L, ACT.concept_dimension c
where C.CONCEPT_PATH like L.Act_path+'%'  --jgk: must support local children
AND [code type] = 'DX'
and (act_path <> '**Not Found' and act_path is not null)
UNION 
select Distinct Feature_name, concept_cd, 'PX' as CodeType--[ACT_PATH], 
from [dbo].[xref_LoyaltyCode_paths] L, ACT.concept_dimension c
where  C.CONCEPT_PATH like L.Act_path+'%' -- jgk
AND [code type] = 'PX'
and (act_path <> '**Not Found' and act_path is not null)
UNION
select distinct Feature_name, concept_cd, 'Lab' as CodeType --[ACT_PATH], 
from [dbo].[xref_LoyaltyCode_paths] L, ACT.concept_dimension c
where  C.CONCEPT_PATH like L.Act_path+'%' -- jgk
AND [code type] = 'lab'
and (act_path <> '**Not Found' and act_path is not null) 
)
UPDATE #COHORT
SET MDVisit_pname2       = CF.MDVisit_pname2
, MDVisit_pname3         = CF.MDVisit_pname3
, Mammography            = CF.Mammography
, BMI                    = CF.BMI
, FluShot                = CF.FluShot
, PneumococcalVaccine    = CF.PneumococcalVaccine
, MedicalExam            = CF.MedicalExam
, FecalOccultTest        = CF.FecalOccultTest
, Paptest                = CF.Paptest
, Colonoscopy            = CF.Colonoscopy
, PSATest                = CF.PSATest
, A1C                    = CF.A1C
, Routine_Care_2         = CF.Routine_Care_2
FROM #COHORT C,
(
  SELECT PATIENT_NUM, MDVisit_pname2, MDVisit_pname3, Mammography, BMI, FluShot, PneumococcalVaccine, MedicalExam, FecalOccultTest, Paptest, Colonoscopy, PSATest, A1C
    , CASE WHEN (MedicalExam+Mammography+PSATest+Colonoscopy+FecalOccultTest+FluShot+PneumococcalVaccine+A1C+BMI)>=2 THEN 1 ELSE 0 END AS Routine_Care_2
  FROM (
    SELECT PATIENT_NUM, MAX(MDVisit_pname2) as MDVisit_pname2, MAX(MDVisit_pname3) as MDVisit_pname3, MAX(Mammography) as Mammography, MAX(BMI) as BMI, MAX(FluShot) as FluShot
    , MAX(PneumococcalVaccine) as PneumococcalVaccine, MAX(MedicalExam) as MedicalExam, MAX(FecalOccultTest) as FecalOccultTest, MAX(Paptest) as Paptest, MAX(Colonoscopy) as Colonoscopy
    , MAX(PSATest) as PSATest, MAX(A1C) as A1C
    FROM (
      SELECT O.PATIENT_NUM
      , CASE WHEN COUNT(DISTINCT CASE WHEN P.CodeType = 'PX' AND (O.PROVIDER_ID is not null and O.provider_id <> '' and O.provider_id <> '@') THEN CONVERT(DATE,O.START_DATE) ELSE NULL END)  = 2 THEN 1 ELSE 0 END AS MDVisit_pname2
      , CASE WHEN COUNT(DISTINCT CASE WHEN P.CodeType = 'PX' AND (O.PROVIDER_ID is not null and O.provider_id <> '' and O.provider_id <> '@') THEN CONVERT(DATE,O.START_DATE) ELSE NULL END)  > 2 THEN 1 ELSE 0 END AS MDVisit_pname3
      , MAX(CASE WHEN P.Feature_name = 'Mammography' THEN 1 ELSE 0 END) AS Mammography
      , MAX(CASE WHEN P.Feature_name = 'BMI' THEN 1 ELSE 0 END) AS BMI
      , MAX(CASE WHEN P.Feature_name = 'Flu Shot' THEN 1 ELSE 0 END) AS FluShot
      , MAX(CASE WHEN P.Feature_name = 'Pneumococcal vaccine' THEN 1 ELSE 0 END) AS PneumococcalVaccine
      , MAX(CASE WHEN P.Feature_name = 'Medical Exam' THEN 1 ELSE 0 END) AS MedicalExam
      , MAX(CASE WHEN P.Feature_name = 'Fecal occult blood test' THEN 1 ELSE 0 END) AS FecalOccultTest
      , MAX(CASE WHEN P.Feature_name = 'Pap test' THEN 1 ELSE 0 END) AS Paptest
      , MAX(CASE WHEN P.Feature_name = 'Colonoscopy' THEN 1 ELSE 0 END) AS Colonoscopy
      , MAX(CASE WHEN P.Feature_name = 'PSA Test' THEN 1 ELSE 0 END) AS PSATest
      , MAX(CASE WHEN P.Feature_name = 'A1C' THEN 1 ELSE 0 END) AS A1C
      from ACT.observation_fact o, CTE_PARAMS p
      where o.CONCEPT_CD = p.CONCEPT_CD
      AND o.START_DATE >=  dateadd(yy,-1, @indexDate)
      AND o.START_DATE < @indexDate
      GROUP BY O.PATIENT_NUM, O.PROVIDER_ID /* AGG AT PROVIDER_ID FIRST LEVEL FOR THE MDVisit_pname VARIABLES */
    )PA /* PROVIDER GRAIN AGG->BASELINE -- AGG AT PATIENT_NUM GRAIN */
    GROUP BY PA.PATIENT_NUM
  )BL /* BASELINE COLLECTED FOR ALL DOMAINS */
)CF /* COHORT FLAGS INCLUDED ROUTINE_CARE_2 */
WHERE C.PATIENT_NUM = CF.PATIENT_NUM;

i2b2plugins / loyalty_cohort

Optimization for #<code_type>_baseline tables for cohort flags #10