kids-first / kf-portal-ui

:bar_chart: The Kids First Data Resource Portal and Social Network User Interface
Apache License 2.0
25 stars 14 forks source link

Fix Survival Calculation #1500

Closed rosibaj closed 5 years ago

rosibaj commented 5 years ago

I made a mistake when first designing the specifications for the survival plot. Both ICGC and GDC use precalculated fields that define the interval of survival, meaning the interval from the first diagnosis to the last gathered followup (which is either death, or alive).

KidsFirst source data is not precalculating this survival interval, but is instead keeping all of the original dates. To correctly calculate survival, we first need to compute the survival interval and send that information into the SurvivalPy.

To correctly calculate the survival interval :

TEST DATA

Example of correctly calculated survival data:

GDC Target-NBL Survival calculation results: https://portal.gdc.cancer.gov/auth/api/v0/analysis/survival?filters=%7B%22content%22%3A%5B%7B%22content%22%3A%7B%22field%22%3A%22cases.case_id%22%2C%22value%22%3A%5B%22set_id%3AAWnHjCbzcg-xDiQUTMKT%22%5D%7D%2C%22op%22%3A%22IN%22%7D%5D%2C%22op%22%3A%22AND%22%7D

GRAPHQL QUERY FOR KF DATA WITH ~ DATASET TO EXAMPLE DATA

query ($sqon: JSON, $size: Int, $offset: Int) {
  participant {
    hits(filters: $sqon, first: $size, offset: $offset) {
      edges {
        node {
          kf_id
          external_id
          diagnoses {
            hits {
              edges {
                node {
                  age_at_event_days
                }
              }
            }
          }
          outcome {
            age_at_event_days
            vital_status
          }
        }
      }
    }
  }
}
"sqon": {
        "op": "and",
        "content": [
            {
                "op": "in",
                "content": {
                    "field": "study.short_name",
                    "value": [
                        "TARGET: Neuroblastoma"
                    ]
                }
            }
        ]
},
"size": 500, 
"offset": 0
}

REFERENCE: GDC API implementation: https://github.com/NCI-GDC/gdcapi/blob/55055d55bbd8a8532f583c4ec956e23977e65675/gdcapi/services/analysis/survival.py

joneubank commented 5 years ago

We updated the input data to match the corrected requirements, and ensured the service used python3. While this worked for local testing, this hasn't produced the desired results when deployed.