4 Update statements then process the remainder of the flags from the baseline temp tables into #cohort ending around line 449.
I propose the following refactorization of all these processes described, such that after building the #cohort table, these flags can be calculated and applied in one transaction with the below. All domains are extracted in one trip to the underlying fact table, the MDVisit_pname variables are calculated first at the required provider_id grain, and then the aggregate is adjusted to the patient_num level. The other DX/PX/Lab feature names can all be processed simultaneously. I could find no other reference in the procedure to these baseline tables beyond this flag calculation that require them to be stored on disk. This optimization, when combined with the #cohort optimization I offered in https://github.com/i2b2plugins/loyalty_cohort/issues/6 in our environment, brings the bulk of the processing steps (cohort+flags) down to sub-30s. I would be interested to see if other sites with much larger warehouses see any gains in this approach.
;WITH CTE_PARAMS AS (
select distinct Feature_name, concept_cd, 'DX' as CodeType --[ACT_PATH],
from [dbo].[xref_LoyaltyCode_paths] L, ACT.concept_dimension c
where C.CONCEPT_PATH like L.Act_path+'%' --jgk: must support local children
AND [code type] = 'DX'
and (act_path <> '**Not Found' and act_path is not null)
UNION
select Distinct Feature_name, concept_cd, 'PX' as CodeType--[ACT_PATH],
from [dbo].[xref_LoyaltyCode_paths] L, ACT.concept_dimension c
where C.CONCEPT_PATH like L.Act_path+'%' -- jgk
AND [code type] = 'PX'
and (act_path <> '**Not Found' and act_path is not null)
UNION
select distinct Feature_name, concept_cd, 'Lab' as CodeType --[ACT_PATH],
from [dbo].[xref_LoyaltyCode_paths] L, ACT.concept_dimension c
where C.CONCEPT_PATH like L.Act_path+'%' -- jgk
AND [code type] = 'lab'
and (act_path <> '**Not Found' and act_path is not null)
)
UPDATE #COHORT
SET MDVisit_pname2 = CF.MDVisit_pname2
, MDVisit_pname3 = CF.MDVisit_pname3
, Mammography = CF.Mammography
, BMI = CF.BMI
, FluShot = CF.FluShot
, PneumococcalVaccine = CF.PneumococcalVaccine
, MedicalExam = CF.MedicalExam
, FecalOccultTest = CF.FecalOccultTest
, Paptest = CF.Paptest
, Colonoscopy = CF.Colonoscopy
, PSATest = CF.PSATest
, A1C = CF.A1C
, Routine_Care_2 = CF.Routine_Care_2
FROM #COHORT C,
(
SELECT PATIENT_NUM, MDVisit_pname2, MDVisit_pname3, Mammography, BMI, FluShot, PneumococcalVaccine, MedicalExam, FecalOccultTest, Paptest, Colonoscopy, PSATest, A1C
, CASE WHEN (MedicalExam+Mammography+PSATest+Colonoscopy+FecalOccultTest+FluShot+PneumococcalVaccine+A1C+BMI)>=2 THEN 1 ELSE 0 END AS Routine_Care_2
FROM (
SELECT PATIENT_NUM, MAX(MDVisit_pname2) as MDVisit_pname2, MAX(MDVisit_pname3) as MDVisit_pname3, MAX(Mammography) as Mammography, MAX(BMI) as BMI, MAX(FluShot) as FluShot
, MAX(PneumococcalVaccine) as PneumococcalVaccine, MAX(MedicalExam) as MedicalExam, MAX(FecalOccultTest) as FecalOccultTest, MAX(Paptest) as Paptest, MAX(Colonoscopy) as Colonoscopy
, MAX(PSATest) as PSATest, MAX(A1C) as A1C
FROM (
SELECT O.PATIENT_NUM
, CASE WHEN COUNT(DISTINCT CASE WHEN P.CodeType = 'PX' AND (O.PROVIDER_ID is not null and O.provider_id <> '' and O.provider_id <> '@') THEN CONVERT(DATE,O.START_DATE) ELSE NULL END) = 2 THEN 1 ELSE 0 END AS MDVisit_pname2
, CASE WHEN COUNT(DISTINCT CASE WHEN P.CodeType = 'PX' AND (O.PROVIDER_ID is not null and O.provider_id <> '' and O.provider_id <> '@') THEN CONVERT(DATE,O.START_DATE) ELSE NULL END) > 2 THEN 1 ELSE 0 END AS MDVisit_pname3
, MAX(CASE WHEN P.Feature_name = 'Mammography' THEN 1 ELSE 0 END) AS Mammography
, MAX(CASE WHEN P.Feature_name = 'BMI' THEN 1 ELSE 0 END) AS BMI
, MAX(CASE WHEN P.Feature_name = 'Flu Shot' THEN 1 ELSE 0 END) AS FluShot
, MAX(CASE WHEN P.Feature_name = 'Pneumococcal vaccine' THEN 1 ELSE 0 END) AS PneumococcalVaccine
, MAX(CASE WHEN P.Feature_name = 'Medical Exam' THEN 1 ELSE 0 END) AS MedicalExam
, MAX(CASE WHEN P.Feature_name = 'Fecal occult blood test' THEN 1 ELSE 0 END) AS FecalOccultTest
, MAX(CASE WHEN P.Feature_name = 'Pap test' THEN 1 ELSE 0 END) AS Paptest
, MAX(CASE WHEN P.Feature_name = 'Colonoscopy' THEN 1 ELSE 0 END) AS Colonoscopy
, MAX(CASE WHEN P.Feature_name = 'PSA Test' THEN 1 ELSE 0 END) AS PSATest
, MAX(CASE WHEN P.Feature_name = 'A1C' THEN 1 ELSE 0 END) AS A1C
from ACT.observation_fact o, CTE_PARAMS p
where o.CONCEPT_CD = p.CONCEPT_CD
AND o.START_DATE >= dateadd(yy,-1, @indexDate)
AND o.START_DATE < @indexDate
GROUP BY O.PATIENT_NUM, O.PROVIDER_ID /* AGG AT PROVIDER_ID FIRST LEVEL FOR THE MDVisit_pname VARIABLES */
)PA /* PROVIDER GRAIN AGG->BASELINE -- AGG AT PATIENT_NUM GRAIN */
GROUP BY PA.PATIENT_NUM
)BL /* BASELINE COLLECTED FOR ALL DOMAINS */
)CF /* COHORT FLAGS INCLUDED ROUTINE_CARE_2 */
WHERE C.PATIENT_NUM = CF.PATIENT_NUM;
Starting at line 159-239: https://github.com/i2b2plugins/loyalty_cohort/blob/8bbd4d2e542db4040806fb5a357014cf19f6f996/usp_LoyaltyCohort_v6.sql#L159
Three temp tables are extracted from the facts called #dx_baseline, #px_baseline, and #lab_baseline. These are later utilized to produce a number of cohort flags starting with MEDVisit_pname2 at line 379: https://github.com/i2b2plugins/loyalty_cohort/blob/8bbd4d2e542db4040806fb5a357014cf19f6f996/usp_LoyaltyCohort_v6.sql#L379
4 Update statements then process the remainder of the flags from the baseline temp tables into #cohort ending around line 449.
I propose the following refactorization of all these processes described, such that after building the #cohort table, these flags can be calculated and applied in one transaction with the below. All domains are extracted in one trip to the underlying fact table, the MDVisit_pname variables are calculated first at the required provider_id grain, and then the aggregate is adjusted to the patient_num level. The other DX/PX/Lab feature names can all be processed simultaneously. I could find no other reference in the procedure to these baseline tables beyond this flag calculation that require them to be stored on disk. This optimization, when combined with the #cohort optimization I offered in https://github.com/i2b2plugins/loyalty_cohort/issues/6 in our environment, brings the bulk of the processing steps (cohort+flags) down to sub-30s. I would be interested to see if other sites with much larger warehouses see any gains in this approach.