hdng / clonevol

Inferring and visualizing clonal evolution in multi-sample cancer sequencing
GNU General Public License v3.0
141 stars 45 forks source link

NA error #2

Open nesilin opened 7 years ago

nesilin commented 7 years ago

Hi!

I 've recently used SciClone for a sample pair of primary and relapse. In the output table, the are some NA values which, as far as I know, correspond to mutations that are not shared between samples.

Here is an example of the data frame:

    chr       st primary.ref primary.var primary.vaf primary.cn
100   1 56790773           0           0        0.00         NA
101   1 57427557          39          18       31.58          2
102   1 58035059          22           9       29.03          2
    primary.cleancn primary.depth relapse.ref relapse.var relapse.vaf
100              NA             0          47          19       28.79
101               2            57          42          18       30.00
102               2            31          29           8       21.62
    relapse.cn relapse.cleancn relapse.depth adequateDepth cluster
100          2               2            66             0      NA
101          2               2            60             1       2
102          2               2            37             1       2
    cluster.prob.1 cluster.prob.2
100             NA             NA
101   0.0156438858      0.9843561
102   0.0002460844      0.9997539

If I try to use infer.clonal.models with these results as:

>df
    cluster primary.vaf primary.depth relapse.vaf relapse.depth
100      NA        0.00             0       28.79            66
101       2       31.58            57       30.00            60
102       2       29.03            31       21.62            37

x <- infer.clonal.models(variants=df,
                         cluster.col.name="cluster",
                         vaf.col.names=vaf.col.names,
                         subclonal.test="none",
                         subclonal.test.model="none",
                         cluster.center="mean",
                         model = 'monoclonal',
                         vaf.in.percent = TRUE,
                         founding.cluster=1,
                         min.cluster.vaf=0.01,
                         p.value.cutoff=0.05)

I got the following error:

Sample 1: primary.vaf <-- primary.vaf
Sample 2: relapse.vaf <-- relapse.vaf
Using monoclonal model
primary.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters: NA,NA 
Error in if (v[i, ]$excluded) { : missing value where TRUE/FALSE needed

Therefore I removed NA:

> df <- na.omit(df)

And I run again and got another error:

Sample 1: primary.vaf <-- primary.vaf
Sample 2: relapse.vaf <-- relapse.vaf
Using monoclonal model
primary.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters:  
primary.vaf : 1 clonal architecture model(s) found

relapse.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters:  
relapse.vaf : 1 clonal architecture model(s) found

Finding matched clonal architecture models across samples...
Found  1 compatible model(s)
Merging clonal evolution trees across samples...
Error in ci$sample.with.cell.frac.ci[cia$is.zero.cell.frac] = paste0("°",  : 
  NAs are not allowed in subscripted assignments

I would be grateful If you could help me solve this error. A part from that, after running SciClone, is it recommended to do subclonal test with bootstrapping? What is the point of running clonevol as subclonal.test="bootstrap" and subclonal.test.model="non-parametric"?

If the idea is to run fishplot after clonevol, should I use rescale.vaf function? How?

Thank you in advance!

hdng commented 7 years ago

This may be a bug. Could you share some reproducible data and code? Thanks.

nesilin commented 7 years ago

I cannot share the data but I will try to give you a dataset with same problem.

nesilin commented 7 years ago

Hi!

The attached file contains the sciclone results of a primary and relapse pair of samples of this paper http://www.pnas.org/content/113/40/11306 . I've run clonevol like this:

library(clonevol)
library(fishplot)

df = read.table('results.txt', header=TRUE, sep = '\t')
df <- na.omit(df)
vaf.col.names <- grep("*.vaf", colnames(df), value = TRUE)

x <- infer.clonal.models(variants=df,
                         cluster.col.name="cluster",
                         vaf.col.names=vaf.col.names,
                         subclonal.test="none",
                         subclonal.test.model="none",
                         cluster.center="mean",
                         model = 'monoclonal',
                         vaf.in.percent = TRUE,
                         founding.cluster=1,
                         min.cluster.vaf=0.01,
                         p.value.cutoff=0.05)

and it gave the same error:

Sample 1: primary.vaf <-- primary.vaf
Sample 2: relapse.vaf <-- relapse.vaf
Using monoclonal model
primary.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters:  
primary.vaf : 2 clonal architecture model(s) found

relapse.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters:  
relapse.vaf : 2 clonal architecture model(s) found

Finding matched clonal architecture models across samples...
Found  2 compatible model(s)
Merging clonal evolution trees across samples...
Error in ci$sample.with.cell.frac.ci[cia$is.zero.cell.frac] = paste0("°",  : 
  NAs are not allowed in subscripted assignments

Thanks! Also please It would be great if you could tell me when bootstrap test is worth it and If I should do it in this case. results.txt

hdng commented 7 years ago

I turned out that the bootstrap was not performed, and thus produced such error due to no cellular fraction was estimated. I'll add this to the list of bugs to be fixed. Thanks for your report.

I would recommend bootstrap to be used by which cellular fraction can be estimated and used to interpret the models. I am working on the technical details and will post it soon.

Thanks.

sophiespo commented 7 years ago

Did you ever fix this bug? I am having the same issue.

hdng commented 7 years ago

Sorry not yet. Is there a specific reason why you chose not to run the bootstrap?

sophiespo commented 7 years ago

Sorry - I thought I was. I had subclonal.test="bootstrap" but I hadn't changed the value of min.cluster.vaf to NULL. Once I did then the error went away.

hdng commented 7 years ago

I don't think min.cluster.vaf is the root cause. What did you use as min.cluster.vaf before?

sophiespo commented 7 years ago

It was the value copied straight from the usage details -0.01.

I've run clonevol for a few samples now, using PyClone results as the input, but I can't determine any clonal models for any sample. I am running this for single samples (none of my samples are related). Would this be causing the problems? Can clonevol work using single samples?

hdng commented 7 years ago

You meant 0.01, correct? For Pyclone, please see this https://github.com/hdng/clonevol/issues/4. I often see Pyclone results in too many clusters (could be due to parameter setting). Before running clonevol, it is important to visually evaluate the clusters and clean them up (eg. removing clusters with small number of variants, removing clusters that look like outliers from other clusters), or even rerun Pyclone a couple of times to obtain the best clustering.

nesilin commented 7 years ago

My experience with Pyclone is that it tends to overestimate the number of clones/clusters in the sample unless you have the type of input data for what Pyclone was designed for. I would not use Pyclone unless I have a deep targeted sequencing panel of genes which produce between 100-1000 SNV per sample with a mean coverage of x100 or more. To have allele-specific copy number estimations and the purity really helps Pyclone to make good corrected VAFs (CCF). A part from that, sometimes it is just a matter of more iterations until there is chain convergence...

anyone1985 commented 6 years ago

I have the same issue. When I run the bootstrap, it said one of two samples found no model. So, I remove the bootstrap and come across the same error.

hdng commented 6 years ago

I always recommend to run bootstrap. If it does not find a model, non-bootstrap won't find it either. When no model is found for a sample, it indicates clustering issue or the data are noisier than the default tolerance of clonevol in that sample.

To find out what may be wrong with the clustering, please see Step 3: Evaluating the variant clustering results of the vignette (https://raw.githubusercontent.com/hdng/clonevol/master/vignettes/clonevol.pdf).