Closed GlastonburyC closed 4 years ago
pancan_scaled_rnaseq.tsv.gz
includes sample level information while tcga-clinical_data.tsv
includes patient level information. The sample level information identifiers are much more descriptive than the patient level information.
Mapping between the two files can be done by subsetting TCGA barcodes. Info here: https://docs.gdc.cancer.gov/Encyclopedia/pages/TCGA_Barcode/
This file might also be helpful: https://github.com/greenelab/pancancer/blob/master/data/sample_freeze.tsv
Isn't it the opposite?
pancan_scaled_rnaseq.tsv.gz
looks like this:
Where as the clinical data contains the full barcode.
in either direction, the mapping can be done the same way. I don't think i've used the portion_id
column though. Is there a sample_id
column or something similar?
This file: https://github.com/greenelab/pancancer/blob/master/data/sample_freeze.tsv made it trivial. Thanks
Hi @gwaygenomics @cgreene I would like to map samples (index values) in
pancan_scaled_rnaseq.tsv.gz
to the metadatatcga-clinical_data.tsv
.Currently the index values for the rnaseq are not unique and I am unable to match them to the metadata.
Could you please advise on how, for example, I could subset the pancancer data to just a single cancer subtype (tying it to the metadata).