hdng / clonevol

Inferring and visualizing clonal evolution in multi-sample cancer sequencing
GNU General Public License v3.0
141 stars 45 forks source link

error while running infer.clonal.models() #11

Closed gunjangala closed 6 years ago

gunjangala commented 7 years ago

Hello, I get the following error when running infer.clonal.models(). Can you please help me fix this? I have attached the input data.txt file for your reference.

x <- data.txt read here

vaf.col.names <- grep('.vaf', colnames(x), value=T)
sample.names <- gsub('.vaf', '', vaf.col.names)
x[, sample.names] <- x[, vaf.col.names]
vaf.col.names <- sample.names
sample.groups <- c("G","D","B","R")
names(sample.groups) <- vaf.col.names
x <- x[order(x$cluster),]
clone.colors <- NULL
pp <- variant.box.plot(x,
                       cluster.col.name = 'cluster',
                       show.cluster.size = FALSE,
                       cluster.size.text.color = 'blue',
                       vaf.col.names = vaf.col.names,
                       vaf.limits = 70,
                       sample.title.size = 20,
                       violin = FALSE,
                       box = FALSE,
                       jitter = TRUE,
                       jitter.shape = 1,
                       jitter.color = clone.colors,
                       jitter.size = 3,
                       jitter.alpha = 1,
                       jitter.center.method = 'median',
                       jitter.center.size = 1,
                       jitter.center.color = 'darkgray',
                       jitter.center.display.value = 'none',
                       highlight = 'is.driver',
                       highlight.note.col.name = 'gene',
                       highlight.note.size = 2,
                       highlight.shape =16,
                       order.by.total.vaf = FALSE
)

> y = infer.clonal.models( variants=x,
+                         cluster.col.name = 'cluster',
+                         vaf.col.names = vaf.col.names,
+                         sample.groups = sample.groups,
+                         subclonal.test = 'bootstrap',
+                         subclonal.test.model = 'non-parametric',
+                         num.boots = 1000,
+                         #founding.cluster = '1',
+                         #cluster.center = 'mean',
+                         #ignore.clusters = NULL,
+                         clone.colors = clone.colors,
+                         min.cluster.vaf = 0.01,
+                         sum.p = 0.05,
+                         alpha = 0.05,
+                         ignore.clusters=T)
Sample 1: G <-- G
Sample 2: D <-- D
Sample 3: B <-- B
Sample 4: R <-- R
Using monoclonal model
Note: all VAFs were divided by 100 to convert from percentage to proportion.
Generating non-parametric boostrap samples...
G : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters: 1,2,3 
User ignored clusters:   
G : 3 clonal architecture model(s) found

D : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters: 1,3 
User ignored clusters:   
D : 1 clonal architecture model(s) found

B : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters: 2,3 
User ignored clusters:   
B : 1 clonal architecture model(s) found

R : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters: 1,2 
User ignored clusters:   
R : 1 clonal architecture model(s) found

Finding consensus models across samples...
Found  3 consensus model(s)
Generating consensus clonal evolution trees across samples...
Error in merge.clone.trees(m, samples = samples, sample.groups, merge.similar.samples = merge.similar.samples) : 
  ERROR: Something wrong. No clones left after filter. They might have been excluded.

Thanks, Gunjan

hdng commented 7 years ago

Your clustering looks untypical. Looking at it, I couldn't figure out which clone should be the founding clone (assuming monoclonal cancer initiation). The founding clone typically has many variants and is present in all samples with "highest" VAF. Your samples also seem to have low purity (VAF is almost zero in G and very low in D). Could you provide more detail on the samples, data types, and how the variants were processed prior to running clonevol?

See: box.pdf

gunjangala commented 7 years ago

Sorry for late reply. The clustering was done using kmeans algorithm. We just have allele frequencies for D, B and R. G was the initial stage and that is the reason we consider the frequency as zero.

hdng commented 7 years ago

Without the details of the data/samples, I can only give generic advice. Not sure if you filtered variants prior to clustering, but if you did, I would try relaxing the filter to get more variants, and plot their VAFs (without clustering) to see their range and distribution. If most variants have low VAFs, either all of your samples have low purity (which is a barrier to this analysis) or there are only a few founding mutations (which will require careful clustering to reveal). It is also worth to try SciClone and Pyclone which are better tools for variant clustering, compared to kmeans.