ncoudray / DeepPATH

Classification of Lung cancer slide images using deep-learning
489 stars 211 forks source link

how can I get the mutation information for a patient? I have downloaded the related dataset. #26

Closed wxh09 closed 5 years ago

wxh09 commented 5 years ago

I don't know how to get the mutation information for a patient ID like this: TCGA-38-4632 TP53 TCGA-38-4632 FAT4 I can't find any mutation information in the file name, (TCGA-86-8279-01A-01-BS1.fc1b4518-c751-49cb-a782-e8c684fb0917.svs) or json file

ncoudray commented 5 years ago

Please see issue #14

ncoudray commented 5 years ago

We also added more info on the README file now ~ 'When working with the TCGA dataset from the GDC Data portal, the mutations can be found by looking for Data Type == "Masked Somatic Mutations". The Data Category is "Simple Nucleotide Variation". Filtering based on that, 4 files per cancer type/project will be found (one for each mutation caller). We used mutect for our paper. A gzipped file can be downloaded and inside that there is a (gzipped also) maf file (a maf file is just a tab-separated file with specific columns).The fist column should be the Hugo Symbol and there should also be a column Tumor_Sample_Barcode with the patient/sample id. Silent mutations can also be filtered out if needed'