hoonghim commented 4 years ago

Dear Ha X. Dang,

Hello, I am trying to analyze clonal evolution using PyClone and ClonEvol.

I have two WES samples from one patient.

When I followed the manual, I could not infer clonal models.

Here is my final input file for ClonEvol (it is stored in pyCloneResultMeltDcastDf below).

clonevol_input.txt

This is the original outcome from PyClone

KRCMC01270.PyClone.loci_results.txt

Below is the code for utilizing ClonEvol #########################################################################

library(data.table) library(clonevol) library(reshape2) library(tidyr)

pyCloneResult <- fread(/Absolute path/KRCMC01270.PyClone.loci_results.txt")

To change the data frame structure - [mutation_id - sample_id - cluster_id - cellular_prevalence - cellular_prevalence_std - variant_allele_frequency] -> [mutation_id - cluster_id - sample1.vaf - sample2.vaf - sample1.cellular_prevalence - sample2.cellular_prevalence - sample1.cellular_prevalence_std - sample2.cellular_prevalence_std]

https://stackoverflow.com/questions/11608167/reshape-multiple-value-columns-to-wide-format

pyCloneResultMeltDf <- melt(pyCloneResultDf, id.vars=c("mutation_id", "cluster_id", "sample_id"))

pyCloneResultMeltDcastDf <- dcast(pyCloneResultMeltDf, cluster_id + mutation_id ~ sample_id + variable)

We have to start cluster id from 1, thus adding +1 to each cluster id (based on the clonevol manual)

    pyCloneResultMeltDcastDf$cluster_id <- pyCloneResultMeltDcastDf$cluster_id + 1

To shorten vaf column names: "_variant_allele_frequency" -> "_vaf", "_cellular_prevalence" -> "_ccf", "---sampld-WBC" -> ""

    #https://stackoverflow.com/questions/28700987/data-table-setnames-combined-with-regex

    setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("_variant_allele_frequency", "_vaf", names(pyCloneResultMeltDcastDf)))
    setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("_cellular_prevalence", "_ccf", names(pyCloneResultMeltDcastDf)))

To remove the normal information ([Tumor---Normal_vaf] -> [Tumor_vaf]

    setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("---\\S+-\\S+_", "_", names(pyCloneResultMeltDcastDf)))

To change the - (minus) into _ (underbar)

    setnames(pyCloneResultMeltDcastDf, names(pyCloneResultMeltDcastDf), gsub("-", "_", names(pyCloneResultMeltDcastDf)))

    vaf.col.names <- grep('_vaf', colnames(pyCloneResultMeltDcastDf), value=T)
    ccf.col.names <- grep('_ccf$', colnames(pyCloneResultMeltDcastDf), value=T)
    sample.names <- gsub('_vaf', '', vaf.col.names)

We utilize sample names as vaf columns (multiply 100 to utilize %)

    pyCloneResultMeltDcastDf[, sample.names] <- pyCloneResultMeltDcastDf[, vaf.col.names] * 100
    vaf.col.names <- sample.names

We multiply 100 to ccf column (from proportion to percentage)

    pyCloneResultMeltDcastDf[, ccf.col.names] <- pyCloneResultMeltDcastDf[, ccf.col.names] * 100

    # prepare sample grouping
    #sample.groups <-sample.names
    sample.groups <- c("C", "M")
    names(sample.groups) <- sample.names

    # setup the order of clusters to display in various plots (later)
    pyCloneResultMeltDcastDf <- pyCloneResultMeltDcastDf[order(pyCloneResultMeltDcastDf$cluster_id),]

    # setup the order of clusters to display in various plots (later)
    pyCloneResultMeltDcastDf <- pyCloneResultMeltDcastDf[order(pyCloneResultMeltDcastDf$cluster_id),]

   # To make a column which is corresponding to is.driver -> utilize CGC (cancer gene census genes) as a driver gene

Load CGC genes

cgc.file <- file.path("/BiO/Share/Database/COSMIC/grch37/v90/cancer_gene_census.csv") cgc.df = read.csv(cgc.file, as.is = T) cgc.genes = unique(cgc.df$Gene.Symbol)

    pyCloneResultMeltDcastDf$CGC <- sapply(strsplit(pyCloneResultMeltDcastDf$mutation_id, "_"), function(x) x[1]) %in% cgc.genes

    #Choosing colors for the clones
    clone.colors <- NULL

Visualizing the variant clusters

    outputFile <- gsub(pattern="loci_results.txt", replacement="loci_results_jitter.pdf", x = pyCloneResult)

    pdf(outputFile, width = 3, height = 3, useDingbats = FALSE, title='')
    pp <- plot.variant.clusters(pyCloneResultMeltDcastDf,
                                cluster.col.name = 'cluster',
                                show.cluster.size = FALSE,
                                cluster.size.text.color = 'blue',
                                vaf.col.names = vaf.col.names,
                                vaf.limits = 70,
                                sample.title.size = 10,
                                violin = FALSE,
                                box = FALSE,
                                jitter = TRUE,
                                jitter.shape = 1,
                                jitter.color = clone.colors,
                                jitter.size = 2,
                                jitter.alpha = 1,
                                jitter.center.method = 'median',
                                jitter.center.size = 1,
                                jitter.center.color = 'darkgray',
                                jitter.center.display.value = 'none',
                                highlight = 'is.driver',
                                highlight.shape = 21,
                                highlight.color = 'blue',
                                highlight.fill.color = 'green',
                                highlight.note.col.name = 'mutatin_id',
                                highlight.note.size = 2,
                                order.by.total.vaf = FALSE)
    dev.off()

>> Here is the result

KRCMC01270.PyClone.loci_results_jitter.pdf

    #Plotting mean/median of clusters across samples (cluster flow)
    plot.cluster.flow(pyCloneResultMeltDcastDf, vaf.col.names = vaf.col.names,
                      sample.names = sample.names,
                      colors = clone.colors)

Here is the result.

########################################################################

Inferring clonal evolution trees

    y = infer.clonal.models(variants = pyCloneResultMeltDcastDf,
                            cluster.col.name = 'cluster',
                            #vaf.col.names = vaf.col.names,
                            ccf.col.names = ccf.col.names,
                            sample.groups = sample.groups,
                            cancer.initiation.model='monoclonal',
                            subclonal.test = 'bootstrap',
                            subclonal.test.model = 'non-parametric',
                            num.boots = 1000,
                            founding.cluster = 1,
                            cluster.center = 'mean',
                            ignore.clusters = NULL,
                            clone.colors = clone.colors,
                            min.cluster.vaf = 0.01,
                            # min probability that CCF(clone) is non-negative
                            sum.p = 0.05,
                            # alpha level in confidence interval estimate for CCF(clone)
                            alpha = 0.05)

########################################################################

Following is the error messages

Calculate VAF as CCF/2 Sample 1: KRCMC01270_T1_D_ccf <-- KRCMC01270_T1_D_ccf Sample 2: KRCMC01270_T2_D_ccf <-- KRCMC01270_T2_D_ccf Using monoclonal model Note: all VAFs were divided by 100 to convert from percentage to proportion. Generating non-parametric boostrap samples... KRCMC01270_T1_D_ccf : Enumerating clonal architectures... Determining if cluster VAF is significantly positive... Exluding clusters whose VAF < min.cluster.vaf=0.01 Non-positive VAF clusters: KRCMC01270_T1_D_ccf : 0 clonal architecture model(s) found

lab vaf color parent ancestors occupied free free.mean 4 4 0.4168754 #cab2d6 NA - 0 0.4168754 NA 5 5 0.3003359 #ff99ff NA - 0 0.3003359 NA 3 3 0.2887949 #b2df8a NA - 0 0.2887949 NA 9 9 0.2780810 #cf8d30 NA - 0 0.2780810 NA 6 6 0.2759430 #fdbf6f NA - 0 0.2759430 NA 2 2 0.2343575 #a6cee3 NA - 0 0.2343575 NA 8 8 0.2068802 #bbbb77 NA - 0 0.2068802 NA 7 7 0.1714719 #fb9a99 NA - 0 0.1714719 NA 1 1 0.1211232 #cccccc NA - 0 0.1211232 NA free.lower free.upper free.confident.level free.confident.level.non.negative 4 NA NA NA NA 5 NA NA NA NA 3 NA NA NA NA 9 NA NA NA NA 6 NA NA NA NA 2 NA NA NA NA 8 NA NA NA NA 7 NA NA NA NA 1 NA NA NA NA p.value num.subclones excluded 4 NA 0 FALSE 5 NA 0 FALSE 3 NA 0 FALSE 9 NA 0 FALSE 6 NA 0 FALSE 2 NA 0 FALSE 8 NA 0 FALSE 7 NA 0 FALSE 1 NA 0 FALSE ERROR: No clonal models for sample: KRCMC01270_T1_D_ccf Check data or remove this sample, then re-run.

Also check if founding.cluster was set correctly!

Could you give me any idea how to solve this problem?

I think PyClone result is not very good because most variants are in cluster 1

Thank you in advance for your time

Sincreley,

Seung-hoon

xmzhuo commented 3 years ago

I have similar issue: input is from pyclone vi with WGS data. The cluster table 1 2 3 4 5 10 1805 203 116 1471

The code I run

mutli_full_infer = infer.clonal.models(variants = multi_full, cluster.col.name = 'cluster',ccf.col.names = paste(c('A','B'),'ccf',sep=''), sample.groups = sample_groups,cancer.initiation.model='monoclonal', subclonal.test = 'bootstrap', subclonal.test.model = 'non-parametric',num.boots = 1000, founding.cluster = 1, cluster.center = 'mean', ignore.clusters = NULL, clone.colors = clone.colors, min.cluster.vaf = 0.01, sum.p = 0.05, alpha = 0.05)

error message

Calculate VAF as CCF/2 Sample 1: Accf <-- Accf Sample 2: Bccf <-- Bccf Using monoclonal model Note: all VAFs were divided by 100 to convert from percentage to proportion. Generating non-parametric boostrap samples... Accf : Enumerating clonal architectures... Determining if cluster VAF is significantly positive... Exluding clusters whose VAF < min.cluster.vaf=0.01 Non-positive VAF clusters:
Accf : 0 clonal architecture model(s) found

lab vaf color parent ancestors occupied free free.mean free.lower 4 4 0.42025 #cab2d6 NA - 0 0.42025 NA NA 5 5 0.27755 #ff99ff NA - 0 0.27755 NA NA 2 2 0.16680 #a6cee3 NA - 0 0.16680 NA NA 3 3 0.09810 #b2df8a NA - 0 0.09810 NA NA 1 1 0.03360 #cccccc NA - 0 0.03360 NA NA free.upper free.confident.level free.confident.level.non.negative p.value 4 NA NA NA NA 5 NA NA NA NA 2 NA NA NA NA 3 NA NA NA NA 1 NA NA NA NA num.subclones excluded 4 0 FALSE 5 0 FALSE 2 0 FALSE 3 0 FALSE 1 0 FALSE ERROR: No clonal models for sample: Accf Check data or remove this sample, then re-run.

Also check if founding.cluster was set correctly!

edceeyuchen commented 1 year ago

@hoonghim ,Hello hoonghim~, I met the same problems, how could you solved? Hope your help,it will most helpful for me!

seunghoonv commented 1 year ago

@hoonghim ,Hello hoonghim~, I met the same problems, how could you solved? Hope your help,it will most helpful for me!

Hi, edceeyuchen

Unfortunately, I couldn't solve the issue. And the author didn't reply to my question (maybe he is busy...).

It's been about 4 years since I couldn't solve this problem.

I think it would be helpful to find papers that use ClonEval and provide their custom script in their code availability section.

Sorry for not being helpful.

Seunghoon

snowvov commented 8 months ago

hello, try to use corrected VAF or CCF. See: https://github.com/hdng/clonevol/issues/21

oghzzang commented 7 months ago

OMG. I found something. @edceeyuchen

I got the same error, But I changed the options "monoclonal" to "polyclonal". It worked well!!

hdng / clonevol

Error in infer.clonal.models: No clonal models for sample #33