Closed maryjgoldman closed 5 years ago
- check that _EVENT= _OS_IND and _OS = _TIME_TO_EVENT, either via the code or via a check on the data itself
Yes. https://github.com/yunhailuo/xena-GDC-ETL/blob/master/xena_gdc_etl/xena_dataset.py#L1821-L1823
- Remove _EVENT and _TIME_TO_EVENT from all survival data files
@ayan-b ~I think replacing these two lines with rename
should be enough: https://github.com/yunhailuo/xena-GDC-ETL/blob/master/xena_gdc_etl/xena_dataset.py#L1822-L1823~
Sorry. Wrong line. Rename here: https://github.com/yunhailuo/xena-GDC-ETL/blob/master/xena_gdc_etl/xena_dataset.py#L1816-L1817 And probably need to keep map(int)
below.
There is a problem with the Xena Browser around this. Not sure why but the browser is not recognizing the columns. Running this by Brian and Jing to see what we should do. May need to revert if we don't have the engineering time to fix the Xena Browser code ... :( :(
So, Jing figured it out. The names of the fields are wrong. Need to rename. _OS -> OS.time _OS_IND -> OS
Please rename and reload. Can do just one cohort if you want or if it's easy, do all of them
@maryjgoldman Updated Survival data for all the cohorts.
As far as I can tell this looks good. However, I will not be able to finish my QA until the extra samples that do not have any genomic data (the -Z) #63 is done
Removed the -01Z samples manually and finished the QA. This looks good.
To calculate survival, you need two columns. Right now there are two pairs of columns in the survival data files:
_EVENT and _TIME_TO_EVENT should be identical to _OS and _OS_IND (i.e. _EVENT= _OS_IND and _OS = _TIME_TO_EVENT). At the time we did the old GDC data we were deprecating _EVENT and _TIME_TO_EVENT to the more precise names of _OS and _OS_IND. However, we still kept _EVENT and _TIME_TO_EVENT to be backward compatible with older Xena Browser releases. At this point in time it has been long enough that we do not need to be backward compatible any more.
To do:
Close this issue when these changes are on the hub, ready for QA.