symbolsToRanges: optionally drop unmapped

> gbm
A MultiAssayExperiment object of 3 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 3: 
 [1] GBM_CNVSNP-20160128: RaggedExperiment with 146852 rows and 1104 columns 
 [2] GBM_GISTIC_Peaks-20160128: RangedSummarizedExperiment with 68 rows and 577 columns 
 [3] GBM_RNASeq2GeneNorm-20160128: SummarizedExperiment with 20501 rows and 166 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices

> symbolsToRanges(gbm)
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
A MultiAssayExperiment object of 4 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 4: 
 [1] GBM_CNVSNP-20160128: RaggedExperiment with 146852 rows and 1104 columns 
 [2] GBM_GISTIC_Peaks-20160128: RangedSummarizedExperiment with 68 rows and 577 columns 
 [3] GBM_RNASeq2GeneNorm-20160128_ranged: RangedSummarizedExperiment with 17527 rows and 166 columns 
 [4] GBM_RNASeq2GeneNorm-20160128_unranged: SummarizedExperiment with 2974 rows and 166 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices

Initially, I thought that GBM-RNASeq2GeneNorm-20160128-unranged is the original SE and wondered why it's still included although the keep argument of symbolsToRanges is FALSE by default. Then I figured that these are actually the genes for which the mapping to ranges failed.

Having two arguments, say keep.original and keep.unmapped, might resolve this potential for confusion:

keep.original corresponds to the current keep, and
keep.unmapped decides whether to just drop genes for which ranges could not be mapped to; where the current behavior is keep.unmapped=TRUE. One could argue that most of the times keep.unmapped=FALSE and a message on data loss would match the needs of most downstream analyses. At least for applications that I have in mind.

The return value section of symbolsToRanges reads

a MultiAssayExperiment where any of the original SummarizedExperiment containing gene symbols as rownames have been replaced or supplemented by a RangedSummarizedExperiment for miR that could be mapped to GRanges, and another SummarizedExperiment for miR that could not be mapped to GRanges

Why specifically tying this to miR? It's a general function, so I think using symbols or genes instead of miR would be appropriate.

waldronlab / TCGAutils

symbolsToRanges: optionally drop unmapped #20