This one has been quite a journey, I have added and removed code many times as the existing setup was more complex than I anticipated. For details on what changes and why see the Jira ticket. Code has been tested in pilot using prod inputs (data loaded in hmfpatients_pilot). Commits are all named the same and are not very specific so ignore those and just look at the total result pls.
The changes add:
A column cohortId to sample table
Tumor curation information for samples that are curated in LAMA (everyting since LAMA v2)
Considerations for the future:
We should make sure that also CPCT and DRUP samples can get their tumor curation information from LAMA (currently they always use ECRF information only)
Avoid using the proxy lims.json and use LAMA directly instead
Make curation sample based rather than patient (currently we simply take the first sample, but samples can have different tumor type)
Add WIDE clinical data processing
Testing:
# Currently 7416 samples in both prod and pilot
SELECT COUNT(*) FROM hmfpatients.sample; # 7416
SELECT COUNT(*) FROM hmfpatients_pilot.sample; # 7416
# Example query to check differences in curation between prod and pilot
SELECT
prod.sampleId AS sampleId,
CONCAT_WS("|", prod.primaryTumorLocation, prod.primaryTumorSubLocation, prod.primaryTumorType, prod.primaryTumorSubType, prod.primaryTumorExtraDetails ) AS prod,
CONCAT_WS("|", pilot.primaryTumorLocation, pilot.primaryTumorSubLocation, pilot.primaryTumorType, pilot.primaryTumorSubType, pilot.primaryTumorExtraDetails ) AS pilot
FROM
(SELECT sampleId, primaryTumorLocation, primaryTumorSubLocation, primaryTumorType, primaryTumorSubType, primaryTumorExtraDetails
FROM hmfpatients.sample LEFT JOIN hmfpatients.baseline ON hmfpatients.baseline.patientId = hmfpatients.sample.patientId) as prod
JOIN
(SELECT sampleId, primaryTumorLocation, primaryTumorSubLocation, primaryTumorType, primaryTumorSubType, primaryTumorExtraDetails
FROM hmfpatients_pilot.sample LEFT JOIN hmfpatients_pilot.baseline ON hmfpatients_pilot.baseline.patientId = hmfpatients_pilot.sample.patientId) as pilot
ON prod.sampleId = pilot.sampleId;
This one has been quite a journey, I have added and removed code many times as the existing setup was more complex than I anticipated. For details on what changes and why see the Jira ticket. Code has been tested in pilot using prod inputs (data loaded in hmfpatients_pilot). Commits are all named the same and are not very specific so ignore those and just look at the total result pls.
The changes add:
cohortId
tosample
tableConsiderations for the future:
lims.json
and use LAMA directly insteadTesting: