hdng / clonevol

Inferring and visualizing clonal evolution in multi-sample cancer sequencing
GNU General Public License v3.0
141 stars 45 forks source link

ERROR: No clonal models for sample #12

Closed gyfangel closed 6 years ago

gyfangel commented 6 years ago

Hello, I get the following error when running infer.clonal.models(). Can you please help me fix this?

ERROR: No clonal models for sample: . Check data or remove this sample, then re-run.

The input include information for 4 samples.

hdng commented 6 years ago

Could you attach some reproducible code and data?

gyfangel commented 6 years ago

Thank you very much for your reply. My code is : vaf.col.names <- grep(".vaf", colnames(dat), value=TRUE) x <- infer.clonal.models(variants=dat,cluster.col.name="cluster",vaf.col.names=vaf.col.names,model = "polyclonal",subclonal.test="bootstrap",subclonal.test.model="non-parametric",cluster.center="mean",num.boots=1000,founding.cluster=5,min.cluster.vaf=0.01,p.value.cutoff=0.01,alpha=0.1,random.seed=63108)

the input data 'dat' is: cluster D1.vaf D2.vaf D3.vaf D4.vaf D1.ref.count D1.var.count D2.ref.count D2.var.count D3.ref.count D3.var.count D4.ref.count D4.var.count 2 0.0000 0.0000 27.4194 26.9841 45 ... 2 2 0.0000 0.0000 26.0274 40.0000 49... 3 5 6.2500 3.1250 10.0000 5.8824 30... 4 5 7.5000 7.6923 12.2449 2.7027 37... 5 3 14.6341 33.3333 0.0000 0.0000 35... 6 5 10.7143 5.1282 21.7391 5.8824 25... 7 1 0.0000 0.0000 36.6667 58.9744 52... 8 3 21.8750 35.8974 0.0000 0.0000 25... 9 2 0.0000 0.0000 10.0000 0.0000 33...

Then the error is ' No clonal models for sample: D4.vaf' Thank you and hope for your reply. Gao

gyfangel commented 6 years ago

Sorry, the data 'dat'. the first col is useless.

hdng commented 6 years ago

First (and probably most important), you have 9 variants whose depth aren't deep. Hence, the clusters may well underestimate the actual number and cellular prevalence of clones. I am not sure what data types you have, but if you can, try getting more variants.

Second, are there any copy number events that distort VAF (eg. you have some variants whose VAF far exceeds that of the founding cluster). The error message "No clonal models for sample.." means that there exist no order of the clones that satisfies the sum rule in clonal ordering (eg. requiring founding clone to have "higher" VAF than subclones).

Third, is there a particular reason you chose polyclonal model? If so, the founding.cluster parameter is not used to infer model, and the normal clone (number 0) will be automatically added chosen to be the founding clone.

gyfangel commented 6 years ago

Thank you for your reply. My data include 42 SNVs from WGS 30X. First, I can get more variants. But about the depth, can I make every depth *10 ? like: D1.vaf D1.ref.count D1.var.count
20 40 10 changed to 20 400 100

Second, I have some SNV with VAF closed to 1.

Third, I tried monoclonal model too. But the error still exists.

Thank you again and best wishes! Gao

gyfangel commented 6 years ago

Hi, I have got more SNVs using about 430 or 1500 SNVs as the input data.But the error stills appears.

And CNVs infect the VAF (about 7/40 SNVs with VAF closed to 1). Then how to solve this problem? I cannot remove the 7 SNVs for their importance. But I have input the CNV results when running sciClone.

I don't know how to choose the clonal model. So I tried both of them(polyclonal and monoclonal). But the error still appear.

I am very confused. And hope for your reply and help. Thank you. Gao

hdng commented 6 years ago

I suggest to spend some time on investigating the clustering and reclustering. Don't rush to running ClonEvol yet. Take a look at ClonEvol vignette (https://github.com/hdng/clonevol/raw/master/vignettes/clonevol.pdf). There are some tips on choosing variants, and evaluating your clustering prior to running ClonEvol (eg. plotting them with plot.cluster.variants function, and visually inspect the clusters).

gyfangel commented 6 years ago

Hi, Thank you for your answers. I have read the ClonEvol vignette. error: No clonal models for sample: D.vaf I did do something to solve the problem. I. Because of the CNVs, I tried PyClone. However, the error still appears. II. I got more SNVs and used about 400 SNVs data set. the error still appears. III. Some vaf of SNVs is affected by CNV. (eg ID=11), if removed them ,I can run the clonevol successfully. But these SNVs is of great importance. I have to remain them. Then the clonevol will give the error. IV. For the sciclone result, when CN=1, I tried to divide Vaf by 2. the error 'No clonal models for D.vaf' still appeared.

The attached file is the input file for clonevol (generated by sciClone, depth_cut_off=10) and code:

x <- infer.clonal.models(variants=dat,cluster.col.name="cluster",vaf.col.names=vaf.col.names,model = "polyclonal",subclonal.test="bootstrap",subclonal.test.model="non-parametric",cluster.center="mean",num.boots=1000,founding.cluster=1,min.cluster.vaf=0.01,p.value.cutoff=0.1,alpha=0.1)

input.txt

Sorry to bother you again. And hope for your reply. Thank you very much. Gao

hdng commented 6 years ago

Could you plot the clusters (see example in the vignette) and attach the plot?

gyfangel commented 6 years ago

Thank you very much for your reply. The following is the plot. "box-5 clusters.pdf" is based on the result from sciclone (depth=10, max.cluster.num=10) and the "box-7 clusters.pdf" is based on the result from sciclone (depth=10, max.cluster.num=20).

Both the two cluster results cannot found model for sample D. When I used "A B C" samples, then the error would be "no clonal model for sample C".

box-5clusters.pdf box-7clusters.pdf

Thank you again for your help. Thank you. Gao

hdng commented 6 years ago

The clusters appeared to be noisy. I would try running sciClone many times with different parameters to get better clustering where variants in the same cluster show similar VAF across samples. It is also critical to obtain the most reliable variant list. Many variants showing low VAF across samples are likely false.

gyfangel commented 6 years ago

Thank you for your answer. The SNVs are validated by single cell PCR. I tried sciclone many times with different parameters but I agree with you that the clusters is noisy. Thank you for your advise and I will move some variants with low VAF , and maybe it will show a good clustering result and the clonevol will work well. Thank you very much~_~` Gao

gyfangel commented 6 years ago

And I have another question for pyclone+clonevol. I used CNV and SNV as input of pyclone. And in the result we can see the col " cellur_prevalence". I have seen the another issue about this. You advised to divided CCF by 2 for the SNV affected by deletion. Then is the CCF the col "cellur_prevalence"? And all of the CCF should be devided by 2? Or just for the part of the SNVs (that is the SNV affected by CNV) ? Thank you . My English is not good enough. Please forgive me for any inconvenience. Gao

hdng commented 6 years ago

You can divide CCF of all variants by 2 and feed it to vaf.col.names or feed CCF estimated from pyclone directly to ccf.col.names parameter in the infer.clonal.models function.