hdng / clonevol

Inferring and visualizing clonal evolution in multi-sample cancer sequencing
GNU General Public License v3.0
141 stars 45 forks source link

Issue infer.clonal.models function #37

Closed tomasellimichelle closed 4 years ago

tomasellimichelle commented 4 years ago

Dear Ha X. Dang,

I am analysing four different spatial samples from cancer. I could reach to run clonevo until the function called infer.clonal.models and could not find any consensus models across samples... Could you try to help me to find what I made wrong or what could I do to obtain the proper models?

---- Below you will find the code used ----

I ran Sciclone and I obtained a file with the following header --> chr st sci_EPC3205_01.ref sci_EPC3205_01.var sci_EPC3205_01.vaf sci_EPC3205_01.cn sci_EPC3205_01.cleancn sci_EPC3205_01.depth sci_EPC3205_02.ref sci_EPC3205_02.var sci_EPC3205_02.vaf sci_EPC3205_02.cn sci_EPC3205_02.cleancn sci_EPC3205_02.depth sci_EPC3205_03.ref sci_EPC3205_03.var sci_EPC3205_03.vaf sci_EPC3205_03.cn sci_EPC3205_03.cleancn sci_EPC3205_03.depth sci_EPC3205_04.ref sci_EPC3205_04.var sci_EPC3205_04.vaf sci_EPC3205_04.cn sci_EPC3205_04.cleancn sci_EPC3205_04.depth adequateDepth cluster cluster.prob.1 cluster.prob.2 cluster.prob.3 cluster.prob.4 cluster.prob.5 cluster.prob.6 cluster.prob.7 cluster.prob.8 cluster.prob.9

library(clonevol)

library(devtools)

sample1 <- "EPC3205_01"

sample2 <-"EPC3205_02"

sample3 <- "EPC3205_03"

sample4 <- "EPC3205_04"

iinformation <- "EPC3205_minlen20" --> output file from sciclone (clusters)

patient <- "EPC3205"

numcolorsqtt <- 9

v1 <- read.delim2(file=iinformation, header=TRUE, sep="\t", dec=".", stringsAsFactors = FALSE) -- Save the file as a dataframe for a easy treat, delete NA's ff<- v1 delete.na <- function(DF, n=0) { DF[rowSums(is.na(DF)) <= n,] } ffy <- delete.na(ff) x<-ffy

vaf.col.names <- grep('.vaf', colnames(x), value=T)

sample.names <- gsub('.vaf', '', vaf.col.names)

x[, sample.names] <- x[, vaf.col.names]

vaf.col.names <- sample.names

sample.groups <- c(sample1, sample2, sample3 , sample4)

names(sample.groups) <- vaf.col.names

x <- x[order(x$cluster),]

library("colorspace")

clone.colors <- sequential_hcl(numcolorsqtt,palette = "Blue-Yellow")

pdf(paste0(patient,'_cluster_minlength_20.pdf'), useDingbats = FALSE, title='cluster_minlength_20')

pp <- plot.variant.clusters(x, cluster.col.name = 'cluster', show.cluster.size = FALSE, cluster.size.text.color = 'blue', vaf.col.names = vaf.col.names, vaf.limits = 70, sample.title.size = 8, violin = FALSE, box = T, jitter = TRUE, jitter.shape = 1, jitter.color = clone.colors, jitter.size = 1, jitter.alpha = 1, jitter.center.method = 'median', jitter.center.size = 1, jitter.center.color = 'darkgray', jitter.center.display.value = 'none', highlight.shape = 21, highlight.color = 'blue', highlight.fill.color = 'green', highlight.note.col.name = 'gene', highlight.note.size = 2, order.by.total.vaf = FALSE)

dev.off()

-------------------- console output --------------------

Warning messages:

1: fun.y is deprecated. Use fun instead.

2: fun.y is deprecated. Use fun instead.

3: fun.y is deprecated. Use fun instead.

4: fun.y is deprecated. Use fun instead.

5: Removed 15 rows containing non-finite values (stat_summary).

6: Removed 15 rows containing non-finite values (stat_boxplot).

7: Removed 15 rows containing missing values (geom_point).

8: Removed 1 rows containing non-finite values (stat_summary).

9: Removed 1 rows containing non-finite values (stat_boxplot).

10: Removed 1 rows containing missing values (geom_point).


plot.pairwise(x, col.names = vaf.col.names, out.prefix = paste0(patient,'_variants_minlength20.plot'), colors = clone.colors)

-------------------- console output --------------------

Warning messages:

1: Removed 15 rows containing missing values (geom_point).

2: Removed 15 rows containing missing values (geom_point).

3: Removed 16 rows containing missing values (geom_point).

4: Removed 1 rows containing missing values (geom_point).

5: Removed 1 rows containing missing values (geom_point).


pdf(paste0(patient,'_flow_minlength_20.pdf'),width=10, height=5, useDingbats=FALSE, title='flow_minlength_20.pdf')

plot.cluster.flow(x, vaf.col.names = vaf.col.names, sample.names = c(sample1, sample2, sample3 , sample4), colors = clone.colors)

dev.off()

y = infer.clonal.models(variants = x, cluster.col.name = 'cluster', vaf.col.names = vaf.col.names, sample.groups = sample.groups, vaf.in.percent = TRUE, cancer.initiation.model='monoclonal', subclonal.test = 'bootstrap', subclonal.test.model = 'non-parametric', num.boots = 1000, founding.cluster = 1,cluster.center = 'mean', ignore.clusters = NULL,merge.similar.samples = TRUE, clone.colors = clone.colors, min.cluster.vaf = 0.01,seeding.aware.tree.pruning = FALSE, sum.p = 0.05,alpha = 0.05)

-------------------- console output --------------------

Sample 1: sci_EPC3205_01 <-- sci_EPC3205_01

Sample 2: sci_EPC3205_02 <-- sci_EPC3205_02

Sample 3: sci_EPC3205_03 <-- sci_EPC3205_03

Sample 4: sci_EPC3205_04 <-- sci_EPC3205_04

Using monoclonal model

Note: all VAFs were divided by 100 to convert from percentage to proportion. Generating non-parametric boostrap samples... sci_EPC3205_01 : Enumerating clonal architectures... Determining if cluster VAF is significantly positive... Exluding clusters whose VAF < min.cluster.vaf=0.01 Non-positive VAF clusters: 5,9 sci_EPC3205_01 : 48 clonal architecture model(s) found

sci_EPC3205_02 : Enumerating clonal architectures... Determining if cluster VAF is significantly positive... Exluding clusters whose VAF < min.cluster.vaf=0.01 Non-positive VAF clusters: 2,7 sci_EPC3205_02 : 48 clonal architecture model(s) found

sci_EPC3205_03 : Enumerating clonal architectures... Determining if cluster VAF is significantly positive... Exluding clusters whose VAF < min.cluster.vaf=0.01 Non-positive VAF clusters: 3,6 sci_EPC3205_03 : 44 clonal architecture model(s) found

sci_EPC3205_04 : Enumerating clonal architectures... Determining if cluster VAF is significantly positive... Exluding clusters whose VAF < min.cluster.vaf=0.01 Non-positive VAF clusters: 4,7,9 sci_EPC3205_04 : 84 clonal architecture model(s) found

Finding consensus models across samples... Found 0 consensus model(s) Found 0 consensus model(s) Scoring models... Pruning consensus clonal evolution trees.... Seeding aware pruning is: off Number of unique pruned consensus trees: 0


Also, I realized that the following plots have gene names, I have to input this data during the sciclone procedure, or there is a way for adds the information now? Moreover, the gene names are a list of driver genes detected or a list of all variants that are located within genes?

Really thanks for your time and help, Michelle.

shaghayeghsoudi commented 1 year ago

Hey, I am running into the same problem, no consensus trees across samples. I am wondering of you could find a solution for that? Thanks

tomasellimichelle commented 1 year ago

Good morning,

Yes, after many tries I founded about you must have a homogeneous raw data, try to filter your variants.

Thanks, Michelle.

El 29 nov 2022, a las 3:47, Shaghayegh Soudi (PhD) @.**@.>> escribió:

Hey, I am running into the same problem, no consensus trees across samples. I am wondering of you could find a solution for that? Thanks

— Reply to this email directly, view it on GitHubhttps://github.com/hdng/clonevol/issues/37#issuecomment-1330003653, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOZOII7464Y6SYMBPCFY2ALWKVVD7ANCNFSM4MI6MJQA. You are receiving this because you modified the open/close state.Message ID: @.***>

shaghayeghsoudi commented 1 year ago

Thank you for your kind message, can you please just explain a little more, What type of filtering you are referring to?