I made a mistake when first designing the specifications for the survival plot. Both ICGC and GDC use precalculated fields that define the interval of survival, meaning the interval from the first diagnosis to the last gathered followup (which is either death, or alive).
KidsFirst source data is not precalculating this survival interval, but is instead keeping all of the original dates. To correctly calculate survival, we first need to compute the survival interval and send that information into the SurvivalPy.
To correctly calculate the survival interval :
[x] Add the diagnosis.age_at_event_days to the sqon
[x] Create a survival interval variable calculated outcome.age_at_event_days - min(diagnosis.age_at_event_days)
[ ] verify the output results are comparable to the example GDC data Note: it will not be 100% the same as there is one less participant in KF, but it should be very similar.
We updated the input data to match the corrected requirements, and ensured the service used python3. While this worked for local testing, this hasn't produced the desired results when deployed.
I made a mistake when first designing the specifications for the survival plot. Both ICGC and GDC use precalculated fields that define the
interval of survival
, meaning the interval from the first diagnosis to the last gathered followup (which is either death, or alive).KidsFirst source data is not precalculating this survival interval, but is instead keeping all of the original dates. To correctly calculate survival, we first need to compute the
survival interval
and send that information into the SurvivalPy.To correctly calculate the
survival interval
:diagnosis.age_at_event_days
to the sqonsurvival interval
variable calculatedoutcome.age_at_event_days
- min(diagnosis.age_at_event_days
)survival estimate
to https://github.com/kids-first/kf-arranger/blob/master/src/endpoints/survival.js#L74TEST DATA
Example of correctly calculated survival data:
GDC Target-NBL Survival calculation results: https://portal.gdc.cancer.gov/auth/api/v0/analysis/survival?filters=%7B%22content%22%3A%5B%7B%22content%22%3A%7B%22field%22%3A%22cases.case_id%22%2C%22value%22%3A%5B%22set_id%3AAWnHjCbzcg-xDiQUTMKT%22%5D%7D%2C%22op%22%3A%22IN%22%7D%5D%2C%22op%22%3A%22AND%22%7D
GRAPHQL QUERY FOR KF DATA WITH ~ DATASET TO EXAMPLE DATA
REFERENCE: GDC API implementation: https://github.com/NCI-GDC/gdcapi/blob/55055d55bbd8a8532f583c4ec956e23977e65675/gdcapi/services/analysis/survival.py