knowledgesystems / pipelines-scrum

Repository for tracking uncategorizable issues related to backend pipelines work
0 stars 0 forks source link

clarify clinical data join strategy #1155

Closed averyniceday closed 1 year ago

averyniceday commented 1 year ago

Done Condition (What do we need? Why do we need it? Keep this is small as possible!)

Technical Description (How are we going to achieve the above)

Potential Issues

Dependencies

Technical Requirements

Outside People/Teams

Changes

sheridancbio commented 1 year ago

A third option is being developed now : limiting the focus to only cover samples which can be linked to a patient through an available dmp patient identifier. The dmp patient identifier will be obtained from the crosswalk table (along with a "proper" MRN). The dmp patient I'd can be used to search for matches in the prefix of the dmp_sample_ids present in table_pathology_impact_sample_summary_dop_anno.tsv, which will give the sample collection date. In parallel, the MRN can be used to retrieve birth dates from the ddp_demographics vds in cdm_data.

sheridancbio commented 1 year ago

When this is worked out, we will compare the original results (when we connected the MRN directly to ddp_demographics) and share discrepancy example lists with cfDNA team. We also can correct the truncated (no leading zeros) mrn numbers encountered to give a better correspondence before doing the diff.

sheridancbio commented 1 year ago

A fourth approach was chosen and evaluated. We are now using two tables in cdm to link CMO patient ids to DMP ids and avoiding the use of MRNs altogether. This resulted in the loss of 6 patients birthdate link because of missing DMP IDs. This has been communicated to Chris Fong for remedy.