hdng / clonevol

Inferring and visualizing clonal evolution in multi-sample cancer sequencing
GNU General Public License v3.0
143 stars 45 forks source link

question: specifying time-points & working with low VAFs #28

Closed paularstrpo closed 5 years ago

paularstrpo commented 5 years ago

Hi. I'm trying to do a sciclone-clonevol-fishplot workflow for deriving the clonal evolution of my samples. I have a case study of 4 WES samples: a primary tumor (extracted at time point 1) and three different regional recurrences (extracted at time point 2). My wxs varies from 10X to 30X depth, and my VAFs top out at around 30. They are all from the same tissue organ. I ran sciclone on my samples, and when I feed the results of it into clonevol, the resulting model seems to assume the samples are all at the same time-point. How can I specify the known time-points per sample so that clonevol can take that into account when inferring the model? Is that possible? Further, what do you reccommend in cases like this with low purity tumors?

hdng commented 5 years ago

Clonevol does not make assumption about when the samples are taken. The samples can be taken at the same time point, or many different time points. Clonevol inferred models for individual samples separately and then crosses compare the models between samples to find the concensus model that is valid for all samples. If clonevol inferred a model, it should be valid for both cases: same or different time points.

Because clonevol does not make assumption about time point, its visualization places all samples at the same time point. Clonevol can plot samples at time points provided via bell.starts parameter in plot.clonal.models function, eg.

s = c(1,2,5)
names(s) = c('sample1', 'sample2', 'sample3')
plot.clonal.models(..., bell.starts=s)
hdng commented 5 years ago

Also, low VAF suggests normal tissue contamination in your tumor samples. This affects ability to call variants but doesn't affect clonevol inference (given reasonable good variant calls). Everything is just scaled down by the same factor.

Tina9 commented 5 years ago

Hello,

I use pyclone to infer subclones. And I have a question about vaf. I use variant_allele_frequenc estimated by pyclone as vaf input of ClonEvol. What I got about this value are all less than 1. After reading the tutorial of ClonEvol, I found ways of calculating vaf(the ratio of the number of reads carrying the variant and the total number of reads at the site) are the same. It vaf are defined as mentioned above, it should be less than 1, right? However, the test data provided in the tutorial are all more than 1.

hdng commented 5 years ago

VAF is always < 1 (or 100%). In the test data, it is provided as "percentage" and should be <100. Regarding Pyclone as input, you have two options:

(1) Divide cellular fraction estimate from pyclone by 2 to get "CN corrected VAF". This is preferred, but keep in mind Pyclone may limit cellular fraction to 1 which then limit corrected VAF to 0.5. See related discussion in: https://github.com/hdng/clonevol/issues/3 and https://github.com/hdng/clonevol/issues/4

(2) Use uncorrected VAF (calculated as variant reads/total reads) with the assumption that you either don't have CNA affecting our variants or you have even copy gain vs. loss between variants within clusters such that the center (eg. mean VAF) of a cluster is not affected by copy number alteration.

Tina9 commented 5 years ago

Thanks for your reply.

blackbeerd commented 5 years ago

Hello, regarding the multiple time points question from the original comment, I have a similar situation but my calls are from cell-free DNA at 8 different collection dates (patient has metastatic disease with significant levels of tumor DNA in cfDNA - highest vafs are 40-50%). I used pyclone to cluster the variants and I when I tried to use the bell.starts parameter in plot.clonal.models, I receive the message "Error in plot.clonal.models(y, box.plot = TRUE, bell.starts = s, fancy.boxplot = TRUE, : unused argument (bell.starts = s)" ... it looks like it's not using the argument?

How do I implement this functionality in plot.clonal.model and do you have any other advice for using ClonEvol over this type of data set (longitudinal sampling)?

Thank you for the excellent R package and support!