Closed maryjgoldman closed 5 years ago
The Masked segmented copy number dataset already removes these samples. Please follow the code that was already written for this
~CNV and Mased CNV are treated similarly.~
~Remove blood normal was done for clinical data: https://github.com/yunhailuo/xena-GDC-ETL/blob/master/xena_gdc_etl/xena_dataset.py#L1622-L1629~
One of the samples in the screenshot and bookmark I provided that has no clinical data but has segmented cnv data is TCGA-19-0955-10A. If you look this up in the GDC, you see it is a blood derived.
If there is no code already written for removing blood derived normal from the masked segmented copy number we need to write it for the segmented cnv
~You should see some of them in CNV, Masked CNV and 4 types of SNV data. Have you?~
I'm really sorry. My bad. Masked CNV data are filtered for blood derived normal here: https://github.com/yunhailuo/xena-GDC-ETL/blob/master/xena_gdc_etl/xena_dataset.py#L874-L894
I might be getting really close to Alzheimer or something...
@ayan-b Please add that filter to CNV data (4 lines above). Thank you!
@maryjgoldman Added CNV data to the hub.
This looks good. Blood derived normals are removed
The new segmented copy number datasets (DNAcopy) have copy number data for samples that are blood normal. We only want copy number data for the tumor samples.
The samples in black are the samples for which we want to remove the segmented copy number data for.
newCNV-bloodnormaltoberemoved.txt
This is a bookmark of the above data. It can be imported back into Xena via the bookmark menu