ohdsi-studies / PioneerWatchfulWaiting

This study is part of the joint PIONEER - EHDEN - OHDSI studyathon in March 2021, and aims to advance understanding of clinical management and outcomes of watchful waiting in prostate cancer.
Apache License 2.0
7 stars 18 forks source link

Issue with cohort id-s #64

Closed marek05 closed 3 years ago

marek05 commented 3 years ago

Issue what I noticed:

During target cohort creation log says:

2021-07-08 12:35:11 [Main thread] INFO PioneerWatchfulWaiting instantiateCohortSet 20/70: Instantiation cohort [PIONEER T5a] Symptom post conservative management_broad (120.sql) 2021-07-08 12:35:32 [Main thread] INFO PioneerWatchfulWaiting instantiateCohortSet 21/70: Instantiation cohort [PIONEER T3a sen1] new PCa conservative management (334.sql) 2021-07-08 12:35:34 [Main thread] INFO PioneerWatchfulWaiting instantiateCohortSet 22/70: Instantiation cohort [PIONEER T3a sen2] new PCa conservative management (335.sql) 2021-07-08 12:35:37 [Main thread] INFO PioneerWatchfulWaiting instantiateCohortSet 23/70: Instantiation cohort [PIONEER T3a sen3] new PCa conservative management (336.sql) 2021-07-08 12:35:39 [Main thread] INFO PioneerWatchfulWaiting instantiateCohortSet 24/70: Instantiation cohort [PIONEER T3a sen4] new PCa conservative management (337.sql) 2021-07-08 12:40:47 [Main thread] INFO PioneerWatchfulWaiting instantiateCohortSet 25/70: Instantiation cohort [PIONEER T3a sen5] new PCa conservative management (338.sql) 2021-07-08 12:35:39 [Main thread] INFO PioneerWatchfulWaiting instantiateCohortSet 24/70: Instantiation cohort [PIONEER T3a sen4] new PCa conservative management (337.sql) 2021-07-08 12:40:47 [Main thread] INFO PioneerWatchfulWaiting instantiateCohortSet 25/70: Instantiation cohort [PIONEER T3a sen5] new PCa conservative management (338.sql)

When it gets to strata cohorts then log says:

2021-07-08 12:42:03 [Main thread] INFO PioneerWatchfulWaiting instantiateCohortSet 59/70: Instantiation cohort [PIONEER S27] Any malignancy, except malignant neoplasm of skin (327.sql) 2021-07-08 12:42:05 [Main thread] INFO PioneerWatchfulWaiting isTaskRequired Skipping cohortId = '334' because unchanged from earlier run 2021-07-08 12:42:05 [Main thread] INFO PioneerWatchfulWaiting isTaskRequired Skipping cohortId = '335' because unchanged from earlier run 2021-07-08 12:42:05 [Main thread] INFO PioneerWatchfulWaiting isTaskRequired Skipping cohortId = '336' because unchanged from earlier run 2021-07-08 12:42:05 [Main thread] INFO PioneerWatchfulWaiting instantiateCohortSet 63/70: Instantiation cohort [PIONEER S31] Total Cardiovascular Disease Event (328.sql) .... 2021-07-08 12:43:26 [Main thread] INFO PioneerWatchfulWaiting isTaskRequired Skipping cohortId = '337' because unchanged from earlier run 2021-07-08 12:43:26 [Main thread] INFO PioneerWatchfulWaiting isTaskRequired Skipping cohortId = '338' because unchanged from earlier run

Basically 334, 335, 336 cohorts are already made during the Target cohorts part.

In target cohorts list there are newer cohorts:

https://github.com/ohdsi-studies/PioneerWatchfulWaiting/blob/master/inst/settings/CohortsToCreateTarget.csv#L22

name atlasName atlasId cohortId
334 [PIONEER T3a sen1] new PCa conservative management 182 334
335 [PIONEER T3a sen2] new PCa conservative management 173 335
336 [PIONEER T3a sen3] new PCa conservative management 174 336
337 [PIONEER T3a sen4] new PCa conservative management 175 337
338 [PIONEER T3a sen5] new PCa conservative management 176 338

In the strata cohorts there are also with the same cohortId-s cohorts: https://github.com/ohdsi-studies/PioneerWatchfulWaiting/blob/master/inst/settings/CohortsToCreateStrata.csv#L29

name atlasName atlasId cohortId
334 [PIONEER S28] Performance status ECOG=0 164 334
335 [PIONEER S29] Performance status ECOG=1 165 335
336 [PIONEER S30] Performance status ECOG=2+ 166 336
337 [PIONEER S37] Anxiety 158 337
338 [PIONEER S38] Prevalent Asthma or Chronic obstructive pulmonary disease (COPD) 159 338

From the cohort name it seems that these are not the same cohorts but because of the cohort ids are the same then the package takes them as the same cohort. I think the strata cohorts are actually run.

@keesvanbochove and @bdemeulder can you check if the cohort id should be as they are.

bdemeulder commented 3 years ago

So there is a discrepancy between the inst/settings/CohortToCreateTarget.csv and the inst/setting/diagnostics/CohortsToCreateTarget.csv.

@denyskaduk, could we just change their name from 334:338 to 121:125?

bdemeulder commented 3 years ago

@keesvanbochove Just let me know if you don't have time to fix this one, then I will do it.

keesvanbochove commented 3 years ago

@bdemeulder I can work on it tomorrow afternoon or Thursday, but we also have to check the actual cohort definitions against these lists. For example, cohort 338 is both listed in CohortsToCreateTarget and CohortsToCreateStrata, but 338.sql can only be one of those - does the query create a target cohort with PCa patients or is it a COPD stratum cohort? Looking at the concept codes, I'm pretty sure it's the strata:

select * from CONCEPT where concept_id in (255573,258780);
 concept_id |           concept_name           | domain_id | vocabulary_id | concept_class_id | standard_concept | concept_code | valid_start_date | valid_end_date | invalid_reason 
------------+----------------------------------+-----------+---------------+------------------+------------------+--------------+------------------+----------------+----------------
     255573 | Chronic obstructive lung disease | Condition | SNOMED        | Clinical Finding | S                | 13645005     | 1970-01-01       | 2099-12-31     | 
     258780 | Emphysematous bronchitis         | Condition | SNOMED        | Clinical Finding | S                | 185086009    | 1970-01-01       | 2099-12-31     | 
(2 rows)

@denyskaduk do we actually need these extra sen1-5 target cohorts?

Also, the fact that this doesn't match up makes me a bit worried.. hopefully there are no other mistakes in the cohort lists, because if the SQL and the cohort identification don't match up we may be looking at incorrect data.

bdemeulder commented 3 years ago

The cohort lists are updated and checked.