Closed crazyhottommy closed 6 years ago
Assuming this is monoclonal cancer initiation, the founding.cluster is often the cluster with the highest CCF (or VAF) in most (or in ideal situation, all) samples. 1000 bootstraps is reasonable for first run. The number of samples and the number of clusters affect the performance time more. How many clusters/samples do you have for your case? You may need some clean-up (eg. removing small or noise clusters). Also, are there many clusters with very small CCF? That may increase the number of models to consider.
Thanks! I only have two samples, but do have smaller clusters. I removed the cluster with very few mutations (what's the cut-off? , I removed clusters with only 1 mutation).
it finished much sooner, but none model were found even at p < 0.1.
what is alpha for?
x <- infer.clonal.models(variants= clonevol_input,
cluster.col.name="cluster",
model = "polyclonal",
vaf.col.names=vaf.col.names,
subclonal.test="bootstrap",
subclonal.test.model="non-parametric",
cluster.center="mean",
num.boots=1000,
founding.cluster= NULL,
min.cluster.vaf=0.01,
p.value.cutoff= 0.1,
alpha= 0.5,
random.seed=63108)
Sample 1: Pa26T1.vaf <-- Pa26T1.vaf
Sample 2: Pa26T2.vaf <-- Pa26T2.vaf
Using polyclonal model
Note: all VAFs were divided by 100 to convert from percentage to proportion.
Generating non-parametric boostrap samples...
Pa26T1.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.001
Non-positive VAF clusters:
Pa26T1.vaf : 910 clonal architecture model(s) found
Pa26T2.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.001
Non-positive VAF clusters:
Pa26T2.vaf : 376 clonal architecture model(s) found
Finding matched clonal architecture models across samples...
Found 0 compatible model(s)
Found 0 compatible evolution models
Pruning merged clonal evolution trees....
Number of unique pruned trees: 0
Scoring models...
0 model(s) with p-value <= 0.1
***WARN: Inra-tumor heterogeneity could result in a clone (eg. founding)
that is not present (no cells) in any samples, although detectable via
clonal marker variants due to that its subclones are distinct across
samples. Therefore, a model with a higher p-value for the CCF of such
a clone can still be biologically consistent, interpretable, and
interesting! Manual investigation of those higher p-value models
is recommended
The cutoff depends on how confident you are on the clusters, eg. if your data are deep enough such that you can trust a cluster of >=5 variants, then use 5. However, this is a bit arbitrary though, and there is no formula for this. I would visualize the clusters first (eg. a pairwise plot of CCF of the variants grouped by clusters for the two samples), and check if the clusters look reasonable. Also, it seems that your scaled VAF is from 0-1. The default for clonevol is 0-100 (percentage). This probably does not affect the underlying inference, but it is good to use 1-100 or set vaf.in.percent=FALSE when calling infer.clonal.models. You should also set alpha and p.value.cutoff the same, and try lower value, eg. 0.01.
Thanks! I did scale variant_prevalence by *100/2 to get VAF ranging from 0-50. so alpha and p.value should be the same? what alpha is?
Even after playing with different arguments, I still do not have any matched clonal architecture models across samples. what should I do with them?
I know the paper is under review, I would like to read to understand more of the tool.
Thanks, Tommy
Visualization of the clusters would help identifying problems with the clustering, eg. is there an outlier or abnormal cluster that conflicts between samples or shows sign of polyclonal cancer initiation? Clonevol does require a reasonably good clustering to start with. Alpha is the significant threshold to identify if a clone exists in a sample. I am working on the manual and technical document (takes too long already sorry), and the manuscript should be posted on bioRxiv soon too. Stay tuned. Thanks.
looking forward to read it. at the same time, I attached a cluster VAF scatter plot. clonevol does not output any model, you see any problems with it? Thanks!
Tommy
Looks a little odd. Is this the scaled VAF based on CCF estimated via Pyclone? Should the highest VAF be near 50%?
sorry, when plot the VAF scatter plot, I used the raw output from pyclone, I did not *100/2 to get the scaled VAF. but if I scale it, it will be the same pattern but just different scale on X and Y axis.
However, when I put the data into clonevol, I did scale to 0-50. So, the pattern of the VAF scatter plot look odd ?
Thanks very much for your help!
Sorry for slow response. I think some models can be inferred, but you need a tweak.
1) You can try plotting the models using plot.clonal.models function. If no matched model is there, clonevol will plot all models for individual samples. This helps debugging what clusters are not compatible across samples.
2) Looks like pyclone put a hard cut on the CCF at 100% due to their statistical model. It seems that tumor purity does not affect their model, so I think if you divide your original VAF by 2 to fake that the tumor purity is always < 50%, thus posterior CCF estimated by pyclone is maxed at 50% (equivalent to 100%), while still allowing some to go over 50%, this would make the scaled VAF looks reasonable, and you can multiply scaled VAF by 2 to get back to the original purity, and run clonevol. Don't forget to plot CCF or VAF like you did.
Are their many copy altered variants? If not, you can try to cluster by SciClone as well.
Thank you for your answer.
those two samples are from the same tumor of the same patient , so what's wrong if no compatible models are inferred? I plot the models individually. please see attachment. how to explain it?
you are saying I need to divide the VAF (calculated from samtools mpileup) by 2 before feeding into pyclone? then multiply pyclone output CCF by 2 to run clonevol? what's the advantage to do this?
I also attached the scaled VAF (pyclone CCF *100/2) for you to test if you can.
Thanks again!
model-Pa30T2.vaf.pdf model-Pa30T1.vaf.pdf clonevol_input.txt
My data have a lot of copy number changes, so I did not use sciclone. For pyclone, one has to provide both the tumor purity (inferred from other tools such as sequenza) and the VAF to the tool. so the end results of CFF always ranges from 0 to 1.
What CCF range you'll get if you specify tumor purity of 100% and also divide VAF by 2?
CCF will be always from 0 to 1. pyclone uses the tumor purity and copynumber information in addition to VAF to identify clusters.
Yes it will be from 0-1, but do you see a smooth range when plotting or a hard cut at 1? If it is smooth, then you can use clonevol.
Hi, I have several questions on how to set the parameters for the
infer.clonal.models
function. First, the help page does not explain all the arguments. e.g. how do I set the founding.cluster? in the github page example, there is acluster.center
argument, but I do not see it in the help.How many times of bootstrap usually is enough? default is 1000, and for around 300 mutations, it is taking several hours to run, and not finish yet..
Thanks for giving more information.
Best, Tommy