hdng / clonevol

Inferring and visualizing clonal evolution in multi-sample cancer sequencing
GNU General Public License v3.0
141 stars 45 forks source link

CCF values in AML1 dataset #15

Closed andhena closed 6 years ago

andhena commented 6 years ago

Hi, I tried to run ClonEvol with the tutorial and I have a question about the cancer cell fraction (CCF) values from the aml1 dataset. I understood that it corresponds to the fraction of tumor cells with the mutation, and that in the case of diploid heterozygous variants the CCF can be calculated as twice of its VAF. I noticed that in the P.ccf column the values are between 0 and 129.58. Is it relevant to have a CCF value greater than 100? I am not familiar with this type of calculations (yet) and it would be very helpful if you could explain it to me. Thank you !

hdng commented 6 years ago

Hi @andhena,

ClonEvol actual calculation uses VAF, which has range 0-100. The CCF that reflects underlying VAF calculation would be estimated as 2xVAF, hence, range 1-200 and should be treated as an estimate of the actual CCF, and obviously anything above 100 indicates error in the estimate with the extreme being 200. The mean/median should give a better estimate of the CCF of a cluster. Regardless, ClonEvol model operates on the VAF space, which can be directly calculated from read counts. If you provide CCF (corrected for copy number alteration), it accepts CCF to be 1-200 (which is provided by other tools such as Pyclone and Dirichlet process clustering (see Fig. 1 showing CCF from 0-2 in this paper http://www.nature.com/articles/nature14347).

andhena commented 6 years ago

Thank you for the clarification @hdng