waldronlab / curatedTCGAData

Curated Data From The Cancer Genome Atlas (TCGA) as MultiAssayExperiment Objects
https://bioconductor.org/packages/curatedTCGAData
41 stars 7 forks source link

converting Mutations components to maftools-friendly form? #18

Open vjcitn opened 6 years ago

vjcitn commented 6 years ago

I am finding it challenging to convert the RaggedExperiment to a more MAF-like tabular form. Am I missing something? Maybe we should add a component with MAF content, perhaps as a dense GRanges, named "MAF"? I think this would be used more readily, and we already have code that converts MAF to RaggedExperiment, which could be provided as a tool.

lwaldron commented 6 years ago

Couple thoughts:

  1. A MAF-like form wouldn't be compatible with MultiAssayExperiment, so a coercion method to DataFrame probably would belong with the RaggedExperiment package.
  2. It is a pain to convert these TCGA RaggedExperiments to matrices or to RangedSummarizedExperiment with one row per gene, equivalently to the RNA-seq datasets. @vjcitn and @LiNk-NY would you try out the helper function in this gist, see if you find it useful? Currently it just converts the RaggedExperiments to genes x samples matrices, but I could easily have it convert to RangedSummarizedExperiment

https://gist.github.com/lwaldron/47fb0c0bece56f58b762192c24117231

lwaldron commented 6 years ago

The gist now converts the RaggedExperiments to RangedSummarizedExperiments, instead of matrices.

lwaldron commented 6 years ago

Back to my comment 1 - this coercion method could be useful for GRangesList as well as for RaggedExperiment, so it's not even just a RaggedExperiment question.

lwaldron commented 6 years ago

@vjcitn and @LiNk-NY take a look at the conveniencefuns branch I just pushed. It's far from perfect but does the following:

> accmae <- curatedTCGAData("ACC", c("CNASNP", "Mutation", "miRNASeqGene", "GISTICT"), dry.run = FALSE)
> accmae
A MultiAssayExperiment object of 4 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 4: 
 [1] ACC_CNASNP-20160128: RaggedExperiment with 79861 rows and 180 columns 
 [2] ACC_GISTIC_ThresholdedByGene-20160128: SummarizedExperiment with 24776 rows and 90 columns 
 [3] ACC_miRNASeqGene-20160128: SummarizedExperiment with 1046 rows and 80 columns 
 [4] ACC_Mutation-20160128: RaggedExperiment with 20166 rows and 90 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices
> simplemae <- simplifyTCGA(accmae)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
> simplemae
A MultiAssayExperiment object of 6 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 6: 
 [1] ACC_Mutation-20160128_simplified: RangedSummarizedExperiment with 22945 rows and 90 columns 
 [2] ACC_CNASNP-20160128_simplified: RangedSummarizedExperiment with 22945 rows and 180 columns 
 [3] ACC_miRNASeqGene-20160128_ranged: RangedSummarizedExperiment with 1002 rows and 80 columns 
 [4] ACC_miRNASeqGene-20160128_unranged: SummarizedExperiment with 44 rows and 80 columns 
 [5] ACC_GISTIC_ThresholdedByGene-20160128_ranged: RangedSummarizedExperiment with 19601 rows and 90 columns 
 [6] ACC_GISTIC_ThresholdedByGene-20160128_unranged: SummarizedExperiment with 5175 rows and 90 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices
> rownames(simplemae)
CharacterList of length 6
[["ACC_Mutation-20160128_simplified"]] A1BG NAT2 ADA CDH2 AKT3 ... KCNE2 DGCR2 CASP8AP2 SCO2
[["ACC_CNASNP-20160128_simplified"]] A1BG NAT2 ADA CDH2 AKT3 ... KCNE2 DGCR2 CASP8AP2 SCO2
[["ACC_miRNASeqGene-20160128_ranged"]] hsa-let-7a-1 hsa-let-7a-2 ... hsa-mir-99a hsa-mir-99b
[["ACC_miRNASeqGene-20160128_unranged"]] hsa-mir-103-1 hsa-mir-103-1-as ... hsa-mir-941-4
[["ACC_GISTIC_ThresholdedByGene-20160128_ranged"]] ACAP3 ACTRT2 AGRN ... SNORA56 TMLHE VBP1
[["ACC_GISTIC_ThresholdedByGene-20160128_unranged"]] C1orf170 ... WASIR1|ENSG00000185203.7
>