monarch-initiative / oncoexporter

Cancer data to GA4GH phenopacket
https://monarch-initiative.github.io/oncoexporter
MIT License
6 stars 1 forks source link

Validate survival variables #82

Closed ielis closed 5 months ago

ielis commented 6 months ago

Currently we take survival info from CDA.

This is what should be done. The method to calculate the survival time in days depends on the vital status (Alive/Deceased).

Alternatively, we can fetch the survival data directly from GDC, when we just fetch two columns:

The GDC API

justaddcoffee commented 6 months ago

@sujaypatil96 how hard would it be to incorporate this into the code that does the call to GDC?

msierk commented 6 months ago

It would seem to be easier to just use the GDC API since they have already calculated the surival time and provided the censoring info. We should get the censored and survival estimate values. We should probably do the calculation from the CDA data and compare, but initially this gives us data we can do a Cox regression with to move forward with the analysis part of the manuscript.

justaddcoffee commented 6 months ago

@sujaypatil96 said he would have a try at pulling survival time from GDC

sujaypatil96 commented 6 months ago

I'll spend some time looking into this next week. I'm OOO 4/4 and 4/5.

justaddcoffee commented 6 months ago

Sure, thanks @sujaypatil96

sujaypatil96 commented 6 months ago

I have figured out how to pull in survival time (in days) and vital status information from GDC via the /analysis/survival endpoint. Now we just need to stick into the phenopacket on VitalStatus. Perhaps we can do this again some testing on a mini hacking session @ielis @justaddcoffee?

justaddcoffee commented 6 months ago

great progress

@sujaypatil96 @ielis what about a hacking session Fri 2 pm Eastern US time on my zoom?

sujaypatil96 commented 6 months ago

I won't be able to make the hacking session on friday because I'll be at an all day training session on thursday, 4/11 and friday 4/12 this week.

But as far as the task goes I think I've completed pulling in survival times and vital status information from GDC and plugged them into the phenopackets as well. It would be nice if you could test it once and let me know.

sujaypatil96 commented 5 months ago

Notes from discussion with @justaddcoffee on 4/16.

If person/patient is "Alive" then we need a way of populating survival time in days, and we can do that as follows:

when we last saw the patient (day of last encounter) - when the patient was diagnosed