lixiangchun / decipherMutationalSignatures

Deciphering mutational signatures with alternative algorithms
7 stars 4 forks source link

Help #2

Closed xtmgah closed 7 years ago

xtmgah commented 7 years ago

Hello Xiangchun:

could you able to share the R function "prepareInputsForDecipherMutationalSignatures" to preparing our data for mutation signature? Thanks.

xtmgah commented 7 years ago

Hello Xuangchun:

Are you able to run the original code (Mutational_Signature_Framework)? i try the example but have the following example, do you have any idea about this error? Thanks.

zhangt8@cn2783:/data/zhangt8/Ref/Mutational_Signature_Framework$ matlab -nodesktop -nosplash -r example1 libGL error: failed to load driver: swrast X Error of failed request: BadValue (integer parameter out of range for operation) Major opcode of failed request: 149 (GLX) Minor opcode of failed request: 3 (X_GLXCreateContext) Value in failed request: 0x0 Serial number of failed request: 21 Current serial number in output stream: 24 MATLAB is selecting SOFTWARE OPENGL rendering.

                                                           < M A T L A B (R) >
                                                 Copyright 1984-2014 The MathWorks, Inc.
                                                  R2014b (8.4.0.150421) 64-bit (glnxa64)
                                                            September 15, 2014

To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com.

Warning: matlabpool will be removed in a future release. To query the size of an already started parallel pool, query the 'NumWorkers' property of the pool. To check if a pool is already started use 'isempty(gcp('nocreate'))'. Warning: matlabpool will be removed in a future release. Use parpool instead. Starting matlabpool using the 'local' profile ... connected to 6 workers. Warning: matlabpool will be removed in a future release. To query the size of an already started parallel pool, query the 'NumWorkers' property of the pool. To check if a pool is already started use 'isempty(gcp('nocreate'))'. Extracting 4 mutational signatures for 10 iterations on 6 labs: Lab 1: Iteration 1 of 10 for extracting 4 mutational signatures in 21 samples completed. Iteration 2 of 10 for extracting 4 mutational signatures in 21 samples completed. Iteration 3 of 10 for extracting 4 mutational signatures in 21 samples completed. Iteration 4 of 10 for extracting 4 mutational signatures in 21 samples completed. Iteration 5 of 10 for extracting 4 mutational signatures in 21 samples completed. Iteration 6 of 10 for extracting 4 mutational signatures in 21 samples completed. Iteration 7 of 10 for extracting 4 mutational signatures in 21 samples completed. Iteration 8 of 10 for extracting 4 mutational signatures in 21 samples completed. Iteration 9 of 10 for extracting 4 mutational signatures in 21 samples completed. Iteration 10 of 10 for extracting 4 mutational signatures in 21 samples completed. Index exceeds matrix dimensions.

Error in plotSignatures (line 125) x = mean(x([1 3],:));

Error in example1 (line 36) plotSignatures(processes, input, allProcesses, idx, processStabAvg);

MATLAB has experienced a low-level graphics error, and may not have drawn correctly. Read about what you can do to prevent this issue by running this command: opengl problems, then restart MATLAB. To share details of this issue with MathWorks technical support, please include this file with your service request: /home/zhangt8/jogl.ex.21014

lixiangchun commented 7 years ago

I have no idea about this Matlab error either.

The source code for "prepareInputsForDecipherMutationalSignatures" are posted as below. Before running prepareInputsForDecipherMutationalSignatures, you need to install several R packages, including "VariantAnnotation", "BSgenome.Hsapiens.UCSC.hg19", "TxDb.Hsapiens.UCSC.hg19.knownGene", "fastcluster", "seqminer", "Ckmeans.1d.dp", "WhopGenome".

prepareInputsForDecipherMutationalSignatures <- function (maf.file, vr = NULL, ref = NULL, strand.bias = FALSE) 
{
    library(BSgenome)
    library(VariantAnnotation)
    if (is.null(vr)) {
        vr = makeVRangesFromOncotator(maf.file = maf.file, 
            removeIndel = TRUE, removeSilent = FALSE, hyperCutoff = NULL)
    }
    if (is.null(ref)) {
        library(BSgenome.Hsapiens.UCSC.hg19)
        ref <- Hsapiens
    }
    mat = get.lego.matrix(vr, ref, strand.bias = strand.bias)
    a = rownames(mat)
    if (strand.bias) {
        strand.info <- substring(a, nchar(a[1]), nchar(a[1]))
    }
    types = paste(substring(a, 1, 1), substring(a, 2, 2), sep = ">")
    subtypes = paste(substring(a, 3, 3), substring(a, 1, 1), 
        substring(a, 4, 4), sep = "")
    if (strand.bias) {
        types <- paste(types, strand.info, sep = "-")
        subtypes <- paste(subtypes, strand.info, sep = "-")
    }
    write.table(mat, file = "originalGenomes.txt", row.names = FALSE, 
        col.names = FALSE, sep = "\t", quote = FALSE)
    write.table(colnames(mat), file = "sampleNames.txt", row.names = FALSE, 
        col.names = FALSE, quote = FALSE)
    write.table(types, file = "types.txt", quote = FALSE, row.names = FALSE, 
        col.names = FALSE)
    write.table(subtypes, file = "subtypes.txt", quote = FALSE, 
        row.names = FALSE, col.names = FALSE)
}
getSilentMutationTypes <- function () 
{
    silentMutations <- c("3'UTR", "5'UTR", "3'Flank", "Targeted_Region", 
        "Silent", "Intron", "RNA", "IGR", "Splice_Region", "5'Flank", 
        "lincRNA")
    return(silentMutations)
}
makeVRangesFromOncotator <- function (maf.file, removeIndel = TRUE, removeSilent = TRUE, 
    hyperCutoff = NULL, removeRndHapChrs = TRUE) 
{
    d <- suppressWarnings(data.table::fread(maf.file, sep = "\t"))
    if (removeRndHapChrs) {
        chrs <- c(c(1:22, "X", "Y", "M"), paste("chr", c(1:22, 
            "X", "Y", "M"), sep = ""))
        d <- subset(d, Chromosome %in% chrs)
    }
    totalMutation <- table(d$Tumor_Sample_Barcode)
    Tumor_Sample_Barcodes <- names(totalMutation)
    message(sprintf("Total sample n = %d.", length(Tumor_Sample_Barcodes)))
    if (!is.null(hyperCutoff)) {
        message(sprintf("Here I use optimal univariate k-means clustering to identify hypermutated samples. Samples belonging to clusters with centers > %g are considered to be hypermutated.", 
            hyperCutoff))
        r <- Ckmeans.1d.dp::Ckmeans.1d.dp(totalMutation)
        hyperMutatedClustIds <- which(r$centers > hyperCutoff)
        hyperMutatedSamples <- names(totalMutation)[r$cluster %in% 
            hyperMutatedClustIds]
        d <- subset(d, !(Tumor_Sample_Barcode %in% hyperMutatedSamples))
        message(sprintf("Removed hypermutated samples (n = %d):", 
            length(hyperMutatedSamples)))
        print(totalMutation[hyperMutatedSamples], file = stderr())
    }
    if (removeIndel) {
        d <- subset(d, Variant_Type == "SNP")
    }
    silentMutations <- getSilentMutationTypes()
    if (removeSilent) {
        d <- subset(d, !(Variant_Classification %in% silentMutations))
    }
    vr <- VariantAnnotation::VRanges(d$Chromosome, IRanges::IRanges(d$Start_position, 
        d$End_position), ref = d$Reference_Allele, alt = d$Tumor_Seq_Allele2, 
        sampleNames = d$Tumor_Sample_Barcode, Variant_Type = as.character(d$Variant_Type), 
        cDNA_Change = as.character(d$cDNA_Change), Codon_Change = as.character(d$Codon_Change), 
        Protein_Change = as.character(d$Protein_Change), Variant_Classification = as.character(d$Variant_Classification), 
        Hugo_Symbol = as.character(d$Hugo_Symbol), Entrez_Gene_Id = as.character(d$Entrez_Gene_Id))
    GenomicRanges::strand(vr) <- "*"
    return(vr)
}

annotateVRanges <- function (vr, removeDuplicate = TRUE) 
{
    if (!exists("Hsapiens")) 
        library(BSgenome.Hsapiens.UCSC.hg19)
    if (!exists("TxDb.Hsapiens.UCSC.hg19.knownGene")) 
        library(TxDb.Hsapiens.UCSC.hg19.knownGene)
    vr <- VariantAnnotation::predictCoding(vr, TxDb.Hsapiens.UCSC.hg19.knownGene, 
        Hsapiens, varAllele = DNAStringSet(vr@alt))
    if (removeDuplicate) {
        queryIds <- vr$QUERYID
        idx <- cumsum(table(queryIds))
        vr <- vr[idx]
    }
    return(vr)
}

get.lego.matrix <- function (vr, fa, strand.bias = FALSE) 
{
    if (strand.bias) {
        vr = annotateVRanges(vr, removeDuplicate = TRUE)
        motifs.t <- mutationContext(vr[strand(vr) == Rle("-")], 
            fa)
        lego96.t <- motifMatrix(motifs.t, group = "sampleNames", 
            normalize = FALSE)
        rownames(lego96.t) <- sub("\\.", "", sub(" ", "", rownames(lego96.t)))
        rownames(lego96.t) <- paste(rownames(lego96.t), "-t", 
            sep = "")
        motifs.n <- mutationContext(vr[strand(vr) == Rle("+")], 
            fa)
        lego96.n <- motifMatrix(motifs.n, group = "sampleNames", 
            normalize = FALSE)
        rownames(lego96.n) <- sub("\\.", "", sub(" ", "", rownames(lego96.n)))
        rownames(lego96.n) <- paste(rownames(lego96.n), "-n", 
            sep = "")
        i1 <- 1:96
        i2 <- i1 + 96
        i <- unlist(lapply(1:96, function(i) c(i1[i], i2[i])))
        lego96 <- rbind(lego96.t, lego96.n)[i, ]
    }
    else {
        motifs <- mutationContext(vr, fa)
        lego96 <- motifMatrix(motifs, group = "sampleNames", 
            normalize = FALSE)
        rownames(lego96) <- sub("\\.", "", sub(" ", "", rownames(lego96)))
    }
    return(lego96)
}

mutationContext <- function (vr, ref, k = 3, strand = FALSE, unify = TRUE, check = FALSE) 
{
   ## source code borrowed from R package SomaticSignatures with BUGS fixed.
    if (!all(ref(vr) %in% DNA_BASES & as.character(alt(vr)) %in% 
        DNA_BASES)) 
        stop("Only SNV substitutions are currently supported.")
    if (k%%2 != 1) 
        stop("'k' must be odd.")
    mid = (k + 1)/2
    gr = granges(vr)
    ranges = resize(gr, k, fix = "center")
    context = getSeq(ref, ranges)
    ref_base = DNAStringSet(ref(vr))
    alt_base = DNAStringSet(alt(vr))
    if (check) {
        ref0 = subseq(context, mid, mid)
        idx_invalid = (ref0 != ref_base)
        if (any(idx_invalid)) 
            warning(sprintf("References do not match in %d cases", 
                sum(idx_invalid)))
    }
    if (strand) {
        s = strand(gr)
        if (any(s == "*")) 
            stop("The strand must be explicit, in order to read the correct strand.")
        idx_minus = (s == "-")
        context[idx_minus] = reverseComplement(context[idx_minus])
        s[idx_minus] = "+"
        strand(gr) = s
    }
    if (unify) {
        idx_complement = ref(vr) %in% c("A", "G")
        context[idx_complement] = reverseComplement(context[idx_complement])
        ref_base[idx_complement] = reverseComplement(ref_base[idx_complement])
        alt_base[idx_complement] = reverseComplement(alt_base[idx_complement])
    }
    subseq(context, mid, mid) = "."
    alteration = xscat(ref_base, alt_base)
    vr$alteration = alteration
    vr$context = context
    return(vr)
}

motifMatrix <- function (vr, group = "sampleNames", normalize = TRUE) 
{
    voi <- if (group %in% names(mcols(vr))) {
        mcols(vr)[, group]
    }
    else {
        df = as(unname(vr), "data.frame")
        if (!(group %in% colnames(df))) {
            stop(sprintf("Column '%s' not present in input object.", 
                group))
        }
        df[, group]
    }
    motif = factor(paste(vr$alteration, vr$context), levels = constructMotifs3())
    y = as(table(motif, voi), "matrix")
    dimnames(y) = unname(dimnames(y))
    if (normalize) {
        y = t(t(y)/colSums(y))
    }
    return(y)
}

Fell free to contact me if you encounter errors.

AteeqMKhaliq commented 7 years ago

Hi, While running the command, i am encountering the following errors, Please let me know on this.

prepareInputsForDecipherMutationalSignatures("/home/ESCA-CN_squam_icgc.maf", strand.bias=FALSE) Error in makeVRangesFromOncotator(maf.file = maf.file, removeIndel = TRUE, : could not find function "makeVRangesFromOncotator"

lixiangchun commented 7 years ago

Code for makeVRangesFromOncotator:

makeVRangesFromOncotator <- function (maf.file, removeIndel = TRUE, removeSilent = FALSE, 
    hyperCutoff = NULL, removeRndHapChrs = TRUE) 
{
    d <- suppressWarnings(data.table::fread(maf.file, sep = "\t"))
    if (removeRndHapChrs) {
        chrs <- c(c(1:22, "X", "Y", "M"), paste("chr", c(1:22, 
            "X", "Y", "M"), sep = ""))
        d <- subset(d, Chromosome %in% chrs)
    }
    totalMutation <- table(d$Tumor_Sample_Barcode)
    Tumor_Sample_Barcodes <- names(totalMutation)
    message(sprintf("Total sample n = %d.", length(Tumor_Sample_Barcodes)))
    if (!is.null(hyperCutoff)) {
        message(sprintf("Here I use optimal univariate k-means clustering to identify hypermutated samples. Samples belonging to clusters with centers > %g are considered to be hypermutated.", 
            hyperCutoff))
        r <- Ckmeans.1d.dp::Ckmeans.1d.dp(totalMutation)
        hyperMutatedClustIds <- which(r$centers > hyperCutoff)
        hyperMutatedSamples <- names(totalMutation)[r$cluster %in% 
            hyperMutatedClustIds]
        d <- subset(d, !(Tumor_Sample_Barcode %in% hyperMutatedSamples))
        message(sprintf("Removed hypermutated samples (n = %d):", 
            length(hyperMutatedSamples)))
        print(totalMutation[hyperMutatedSamples], file = stderr())
    }
    if (removeIndel) {
        d <- subset(d, Variant_Type == "SNP")
    }
    silentMutations <- getSilentMutationTypes()
    if (removeSilent) {
        d <- subset(d, !(Variant_Classification %in% silentMutations))
    }
    vr <- VariantAnnotation::VRanges(d$Chromosome, IRanges::IRanges(d$Start_position, 
        d$End_position), ref = d$Reference_Allele, alt = d$Tumor_Seq_Allele2, 
        sampleNames = d$Tumor_Sample_Barcode, Variant_Type = as.character(d$Variant_Type), 
        cDNA_Change = as.character(d$cDNA_Change), Codon_Change = as.character(d$Codon_Change), 
        Protein_Change = as.character(d$Protein_Change), Variant_Classification = as.character(d$Variant_Classification), 
        Hugo_Symbol = as.character(d$Hugo_Symbol), Entrez_Gene_Id = as.character(d$Entrez_Gene_Id))
    GenomicRanges::strand(vr) <- "*"
    return(vr)
}
AteeqMKhaliq commented 7 years ago

Should i run this command seperatly? First i ran the following command "prepareInputsForDecipherMutationalSignatures(maf.file, strand.bias=FALSE)" then it showed the error. Please elaborate on this

Error"Error in makeVRangesFromOncotator(maf.file = maf.file, removeIndel = TRUE, : could not find function "makeVRangesFromOncotator"

AteeqMKhaliq commented 7 years ago

HI, It would be great if you let me know about library(cgat). I am not able to find it.