icgc-argo / argo-clinical

Clinical data submission for ARGO programs.
GNU Affero General Public License v3.0
2 stars 0 forks source link

🐛 Stored data inconsistency - therapies stored separately from their treatment records #1186

Open joneubank opened 2 months ago

joneubank commented 2 months ago

Describe the bug

Cases have been found in the clinical database where submitted clinical data have the Treatment Details is stored in a new empty treatment, instead of inside the therapies array of its corresponding treatment.

The expected data structure would look like:

{
    "treatments": [
        {
            "clinicalInfo":{
                "program_id" : "TEST-CA",
                "submitter_donor_id" : "qwerty1234",
                "submitter_treatment_id" : "abcd1234"
            },

            "therapies" : [
                {
                    "clinicalInfo" : {
                            "program_id" : "TEST-CA",
                            "submitter_donor_id" : "qwerty1234",
                            "submitter_treatment_id" : "abcd1234",
                            "drug_rxnormcui" : "32592",
                            "drug_name" : "Oxaliplatin"
                    },
                    "therapyType" : "chemotherapy"
                }
            ],
            "treatmentId" : 1234
        }
    ]
}

But instead it is being inserted as two separate treatment records:

{
    "treatments": [
        {
            "therapies" : [
                {
                    "clinicalInfo" : {
                            "program_id" : "TEST-CA",
                            "submitter_donor_id" : "qwerty1234",
                            "submitter_treatment_id" : "abcd1234",
                            "drug_rxnormcui" : "32592",
                            "drug_name" : "Oxaliplatin"
                    },
                    "therapyType" : "chemotherapy"
                }
            ],
            "treatmentId" : 1235
        },
        {
            "clinicalInfo":{
                "program_id" : "TEST-CA",
                "submitter_donor_id" : "qwerty1234",
                "submitter_treatment_id" : "abcd1234"
            },
            "therapies" : [ ],
            "treatmentId" : 1234
        }
    ]
}

Steps To Reproduce

Unknown how to reproduce. We have attempted to recreate the submission but the data is submitted in the expected structure. As part of this ticket we want to review the submission code to identify any conditional or edge cases that could cause this.

It is believed that all the treatment and therapy records were created as part of the same clinical data submission since their generated Treatment IDs are near to each other (within 20) but not consecutive, and the order is random (some treatments are before therapies and some are after).

Ask @joneubank or @UmmulkiramR for example cases.

Objectives

edsu7 commented 1 month ago

Copy from slack for historical records: Steve from POG-CA is trying to submit clinical data but is being blocked by an error, following validation he gets teh message ‘an error has occured’

I took a look at the graphql query and its:

{
    "errors": [
        {
            "message": "{\"error\":\"TypeError\",\"message\":\"Cannot destructure property 'interval_of_followup' of 'clinicalInfo' as it is undefined.\"}",
            "locations": [],
            "path": [
                "validateClinicalSubmissions"
            ],
            "extensions": {
                "code": "INTERNAL_SERVER_ERROR"
            }
        }
    ],
    "data": null
}

I think whats happening is when steve is designating a donor as lost to follow up via submitter treatment_id 920cec14-2059-5aff-b1da-6671530c7601_3 Clinical is encountering the “orphaned” treatment that’s missing clinicalInfo in treatmentId 2598

Additional context: There is a bug when treatment details is submitted before treatment, clinical assigns them each their own ARGO treatment ID, this results in incorrect data shape with a treatmentDetails lacking treatment Info and vice versa.

    {
      "therapies": [
        {
          "clinicalInfo": {
            "program_id": "POG-CA",
            "submitter_donor_id": "30253",
            "submitter_treatment_id": "920cec14-2059-5aff-b1da-6671530c7601_3",
            "drug_rxnormcui": "1727455",
            "drug_name": "Alectinib",
            "chemotherapy_drug_dose_units": "mg/m2",
            "prescribed_cumulative_drug_dose": null,
            "actual_cumulative_drug_dose": 13200,
            "dose_intensity_reduction": null,
            "dose_intensity_reduction_event": null,
            "dose_intensity_reduction_amount": null
          },
          "therapyType": "chemotherapy"
        }
      ],
      "treatmentId": 2598
    },
   {
      "clinicalInfo": {
        "program_id": "POG-CA",
        "submitter_donor_id": "30253",
        "submitter_treatment_id": "920cec14-2059-5aff-b1da-6671530c7601_3",
        "submitter_primary_diagnosis_id": "POG550_1.0",
        "treatment_type": [
          "Chemotherapy"
        ],
        "is_primary_treatment": "Unknown",
        "line_of_treatment": null,
        "treatment_start_interval": 1180,
        "treatment_duration": 1079,
        "days_per_cycle": null,
        "number_of_cycles": null,
        "treatment_intent": "Palliative",
        "treatment_setting": "Adjuvant",
        "response_to_treatment_criteria_method": "Physician Assessed Response Criteria",
        "response_to_treatment": "Physician assessed partial response",
        "outcome_of_treatment": null,
        "toxicity_type": null,
        "hematological_toxicity": null,
        "adverse_events": null,
        "clinical_trials_database": null,
        "clinical_trial_number": null
      },
      "therapies": [],
      "treatmentId": 2646
    }
edsu7 commented 1 month ago

Additional investigation with Ummulkiram.

MUTO-INTL was able to submit lost_to_follow_up despite incorrect data shape.

We suspect they uploaded all 3 files (donor,treatment and treatmentDetails), bypassing data base check. Still creates incorrect data shape however.