Closed grindhej closed 2 years ago
@grindhej
I think this is something for maftools
to implement since it is outside of the Bioconductor conventions.
It could be implemented as a helper function to make the names maftools
friendly and possibly applied
automatically.
The best case scenario is that maftools
supports both types of names (i.e., GRanges
and maftools
friendly).
The GRanges
names for sure increase interoperability with other packages that work on ranges.
For example, with the right names, one can easily do makeGRangesFromDataFrame
:
library(maftools)
library(MultiAssayExperiment)
laml.maf <- system.file('extdata', 'tcga_laml.maf.gz', package = 'maftools')
laml.clin <- system.file('extdata', 'tcga_laml_annot.tsv', package = 'maftools')
laml <- read.maf(maf = laml.maf, clinicalData = laml.clin)
lamf <- maf2mae(laml)
maef <- MultiAssayExperimentToMAF(lamf)
makeGRangesFromDataFrame(maef@data, ignore.strand=TRUE)
GRanges object with 1732 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 10 37455582 *
[2] 20 35240439 *
[3] 2 25457243 *
[4] 2 209113112 *
[5] 5 170834736-170834737 *
... ... ... ...
[1728] 22 29688596 *
[1729] 4 48577210 *
[1730] 15 42641596 *
[1731] 16 2812699-2812700 *
[1732] 22 22324724 *
-------
seqinfo: 23 sequences from an unspecified genome; no seqlengths
Note that the use of @
is not standard practice (https://contributions.bioconductor.org/r-code.html?q=accessor#code-syntax-and-efficiency).
Best regards, Marcel
maftools likes chrom/start/stop variables to be specifically called Chromosome/Start_Position/End_Position
However, in maftools maf2mae() function, they are conversted to seqnames/start/end for GRanges. See maf2mae function below, for reference
maf2mae function (m = NULL) { if (any(!requireNamespace("RaggedExperiment") | !requireNamespace("MultiAssayExperiment"))) { stop("Converting to MultiAssayExperiment requires RaggedExperiment and MultiAssayExperiment packages!") } if (is.null(m)) { stop("Missing input MAF") } maf_coldata = S4Vectors::DataFrame(getClinicalData(m)) rownames(maf_coldata) = maf_coldata$Tumor_Sample_Barcode maf_nsyn_gr = GenomicRanges::GRanges(seqnames = m@data$Chromosome, ranges = IRanges::IRanges(start = m@data$Start_Position, end = m@data$End_Position), strand = "", S4Vectors::DataFrame(m@data[, setdiff(x = colnames(m@data), y = c("Chromosome", "Start_Position", "End_Position")), with = FALSE])) maf_nsyn_gr = split(maf_nsyn_gr, as.factor(maf_nsyn_gr$Tumor_Sample_Barcode)) maf_nsyn_re = RaggedExperiment::RaggedExperiment(maf_nsyn_gr, colData = maf_coldata) maf_syn_gr = GenomicRanges::GRanges(seqnames = m@maf.silent$Chromosome, ranges = IRanges::IRanges(start = m@maf.silent$Start_Position, end = m@maf.silent$End_Position), strand = "", S4Vectors::DataFrame(m@maf.silent[, setdiff(x = colnames(m@maf.silent), y = c("Chromosome", "Start_Position", "End_Position")), with = FALSE])) maf_syn_gr = split(maf_syn_gr, as.factor(maf_syn_gr$Tumor_Sample_Barcode)) maf_syn_re = RaggedExperiment::RaggedExperiment(maf_syn_gr, colData = maf_coldata) MultiAssayExperiment::MultiAssayExperiment(experiments = list(maf_nonSyn = maf_nsyn_re, maf_syn = maf_syn_re), metadata = list(summary = m@summary, variant.classification.summary = m@variant.classification.summary, variant.type.summary = m@variant.type.summary, variants.per.sample = m@variants.per.sample, gene.summary = m@gene.summary), colData = maf_coldata) }
It would be nice if MultiAssayExperimentToMAF converted them back to what maftools likes. Here's some example code where I pull a maf out of a mae with MultiAssayExperimentToMAF:
names(maf@data)[names(maf@data) == "seqnames"] = "Chromosome" names(maf@data)[names(maf@data) == "start"] = "Start_Position" names(maf@data)[names(maf@data) == "end"] = "End_Position" names(maf@maf.silent)[names(maf@maf.silent) == "seqnames"] = "Chromosome" names(maf@maf.silent)[names(maf@maf.silent) == "start"] = "Start_Position" names(maf@maf.silent)[names(maf@maf.silent) == "end"] = "End_Position"
I understand if there's some reason that isn't feasible.
Thanks!