hdng / clonevol

Inferring and visualizing clonal evolution in multi-sample cancer sequencing
GNU General Public License v3.0
141 stars 45 forks source link

How to use option ignore.clusters in infer.clonal.models #23

Closed xiongssg closed 6 years ago

xiongssg commented 6 years ago

Hi, I have a question on how to set the parameter 'ignore.clusters' for the infer.clonal.models function. Firstly, I set it to a cluster number, eg: 3. But it doesn't work. Then I define a vector, eg: IGNORE=c(3,4), and used in the function with ignore.clusters=IGNORE, it report error too.

Thanks for giving some advice.

Best, Xiong

hdng commented 6 years ago

ignore.clusters should be a scalar or vector of clusters. I am not sure why it didn't work for you, but without reproducible code/data, I can't tell. Could you send code/data for this issue?

hdng commented 6 years ago

Btw, the ignore.clusters parameter does not do anything too fancy compared to just excluding the clusters you want to ignore from the input data frame. The only you'll lose is the cluster plot still has ignored clusters if they are passed through the ignore.clusters param.

xiongssg commented 6 years ago

Code: y = infer.clonal.models(variants = x, cluster.col.name = 'cluster', ccf.col.names = ccf.col.names, sample.groups = sample.groups, cancer.initiation.model='monoclonal', subclonal.test = 'bootstrap', subclonal.test.model = 'non-parametric', ignore.clusters=3, num.boots = 1000, founding.cluster = 4, cluster.center = 'mean', ignore.clusters = NULL, clone.colors = clone.colors, min.cluster.vaf = 0.01, sum.p = 0.05, alpha = 0.05) Data: cluster R12A.ccf T12A.ccf R12A.vaf T12A.vaf 10:124746153 1 0.12 47.18 0.1 28.57 10:93271811 1 0.12 42.54 0.1 18.18 10:93751992 1 0.12 40.78 0.1 12.26 11:115099752 1 0.12 43.58 0.1 20.83 11:22272266 1 0.12 43.71 0.1 23.24 11:5080533 1 0.12 46.87 0.1 11.16

And the Error: Error in infer.clonal.models(variants = x, cluster.col.name = "cluster", : formal argument "ignore.clusters" matched by multiple actual arguments

If I just exclude the ignore clusters in the input data frame, then I have to change the cluster number to make it contiguous. And so the output plot will not match the plot generated by Pyclone.

hdng commented 6 years ago

You have two "ignore.clusters" in your command.

ignore.clusters=3, ignore.clusters = NULL,

xiongssg commented 6 years ago

Thanks, I take your advice, and exclude the ignore clusters in the input data frame. Then it identified one clone model like this: lab color parent excluded sample leaf.of.sample is.term 1 1 #cccccc 4 FALSE T12A.ccf T12A.ccf TRUE 2 2 #a6cee3 3 FALSE R12A.ccf R12A.ccf TRUE 3 3 #b2df8a 4 FALSE T12A.ccf,R12A.ccf T12A.ccf TRUE 4 4 #cab2d6 -1 FALSE T12A.ccf,R12A.ccf FALSE sample.with.cell.frac.ci 1 T12A.ccf : 46-48.3%(0.95)/p=0.000 2 R12A.ccf : 82.2-84.5%(0.95)/p=0.000 3 T12A.ccf : 45.6-50.1%(0.95)/p=0.000,R12A.ccf : 11.4-13.9%(0.95)/p=0.000 4 T12A.ccf : 2.2-7.8%(0.95)/p=0.000,R12A.ccf : 3.4-4.7%(0.95)/p=0.000 sample.with.nonzero.cell.frac.ci 1 T12A.ccf : 46-48.3%(0.95)/p=0.000 2 R12A.ccf : 82.2-84.5%(0.95)/p=0.000 3 T12A.ccf : 45.6-50.1%(0.95)/p=0.000,R12A.ccf : 11.4-13.9%(0.95)/p=0.000 4 T12A.ccf : 2.2-7.8%(0.95)/p=0.000,R12A.ccf : 3.4-4.7%(0.95)/p=0.000 sample.with.nonzero.cell.frac.noci sample.group sample.group.color 1 T12A.ccf T12A red 2 R12A.ccf R12A blue 3 T12A.ccf,R12A.ccf R12A,T12A green 4 T12A.ccf,R12A.ccf R12A,T12A green num.samples leaf.of.sample.count clone.ccf.combined.p branches blengths 1 1 1 0 1 10.049876 2 1 1 0 20 10.583005 3 2 1 0 2 6.928203 4 2 0 0 Y 13.928388 samples.with.nonzero.cell.frac node.border.color node.border.width 1 T12A.ccf black 1 2 R12A.ccf black 1 3 T12A.ccf,R12A.ccf black 1 4 T12A.ccf,R12A.ccf black 1 branch.border.color branch.border.linetype branch.border.width 1 white solid 0.5 2 white solid 0.5 3 white solid 0.5 4 white solid 0.5

But when I go to Plotting, it report a error: Error in x0[i] <- x1[which(x$branches == parent)] : replacement has length zero

I have check your source code, and found the potential reason: parent <- substr(x$branches[i], 1, nchar(x$branches[i])-1). This is not worked for branch Y. So I have to manipulated the tree, to change the branch Y be the first row in the y$matched$merged.trees[[1]]. Then it worked.

hdng commented 6 years ago

This may be an old bug triggered by founding.cluster not equal 1. I'll take a look. Thanks for reporting.

xiongssg commented 6 years ago

Sorry, I have another qusetion about the parameter. In the Supplementary Methods of your paper, you say ClonEvol offers an option for multi-region samples. But I
didn't find detailed information in the tutorial. I guessed that I can use sample.groups to do this analysis. So if I have 5 samples from one patient, eg: 3 primaries and 2 metastases : T1,T2,T3,L1,L2. How can I define the sample.groups.

hdng commented 6 years ago

Sorry it was never documented, but available. Can you try something like this? You'll need vaf.col.names, ref.col.names (ref count), and var.col.names (variant count) for the samples.

    primaries = sample.groups=='p'
    y2 = merge.samples(y, samples=vaf.col.names[primaries],
        new.sample='P', new.sample.group='p',
        ref.cols=ref.col.names[primaries],
        var.cols=var.col.names[primaries])
    y2 = convert.consensus.tree.clone.to.branch(y2, branch.scale='sqrt')
hdng commented 6 years ago

You'll need to install the latest version that I pushed minutes ago.

xiongssg commented 6 years ago

Thanks for your advice. So this function is to merge multi primaries to be one primary sample? Right? And I can define

vaf.cof.names<-c("T1","T2","T3","L1","L2")
sample.groups<-c("p","p","p","m","m")
names(sample.groups)<-vaf.cof.names

The I can use the code you posted above to merge the 3 primaries to one single sample?

hdng commented 6 years ago

You also need to provide ref.cols and var.cols for reference and variant read counts. The merge.samples function will aggregate read counts from multi region samples and rerun bootstrapping.

xiongssg commented 6 years ago

Thanks. I'll have a try on my data.