waldronlab / TCGAutils

Toolbox package for organizing and working with TCGA data
https://bioconductor.org/packages/TCGAutils
23 stars 6 forks source link

symbolsToRanges: optionally drop unmapped #20

Closed lgeistlinger closed 5 years ago

lgeistlinger commented 5 years ago
> gbm
A MultiAssayExperiment object of 3 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 3: 
 [1] GBM_CNVSNP-20160128: RaggedExperiment with 146852 rows and 1104 columns 
 [2] GBM_GISTIC_Peaks-20160128: RangedSummarizedExperiment with 68 rows and 577 columns 
 [3] GBM_RNASeq2GeneNorm-20160128: SummarizedExperiment with 20501 rows and 166 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices
> symbolsToRanges(gbm)
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
A MultiAssayExperiment object of 4 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 4: 
 [1] GBM_CNVSNP-20160128: RaggedExperiment with 146852 rows and 1104 columns 
 [2] GBM_GISTIC_Peaks-20160128: RangedSummarizedExperiment with 68 rows and 577 columns 
 [3] GBM_RNASeq2GeneNorm-20160128_ranged: RangedSummarizedExperiment with 17527 rows and 166 columns 
 [4] GBM_RNASeq2GeneNorm-20160128_unranged: SummarizedExperiment with 2974 rows and 166 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices

Initially, I thought that GBM-RNASeq2GeneNorm-20160128-unranged is the original SE and wondered why it's still included although the keep argument of symbolsToRanges is FALSE by default. Then I figured that these are actually the genes for which the mapping to ranges failed.

Having two arguments, say keep.original and keep.unmapped, might resolve this potential for confusion:

The return value section of symbolsToRanges reads

a MultiAssayExperiment where any of the original SummarizedExperiment containing gene symbols as rownames have been replaced or supplemented by a RangedSummarizedExperiment for miR that could be mapped to GRanges, and another SummarizedExperiment for miR that could not be mapped to GRanges

Why specifically tying this to miR? It's a general function, so I think using symbols or genes instead of miR would be appropriate.

LiNk-NY commented 5 years ago

this is handled by keep.assay and unmapped arguments. Thanks for reporting this!