morrislab / phylowgs

Application for inferring subclonal composition and evolution from whole-genome sequencing data.
GNU General Public License v3.0
108 stars 55 forks source link

Questions about pseudo-SSM variant read counts #137

Closed HushWay closed 2 years ago

HushWay commented 2 years ago

Dear sir,

Your article said that "we create an equivalent pseudo-SSM with population frequency φi by adding an SSM to the dataset with total reads di and variant reads di × φi/2 rounded to the nearest whole number", but this really confused me because only cancer cell fractions rather than population frequency of heterozygous mutation are twice of VAF. If I did not make a mistake, your population frequency refers to cell fraction of all cells including normal cells. How to explain the formula?

Thanks for your attention, Liuwei

quaidmorris commented 2 years ago

Population frequency of a mutation is the percentage (or proportion) of cells in the sample that have the mutation. So if p% of cells have the heterozygous mutation then the VAF, when expressed as a percentage will be p/2% because each cell with the mutation contributes p% of the alleles to the total number of alleles from that locus in the population, and 1/2 of the alleles that these cells contribute have the mutation. This is assuming that the loci in both the normal and mutated are diploid.

If the VAF (as a percentage) is p/2% and there are di reads mapping to that locus, then, on average, we would expect di x p/2 of them to contain the mutation.

Cancer cell fraction (CCF) normally means the proportion of cells in the population that are cancerous. So if the mutation in question is a clonal mutation (I.e., is present in all cancer cells) then the CCF will be p% and the VAF of that clonal mutation will indeed be p/2%.

However, the relationship between population frequency and VAF still holds if the mutation is subclonal, I.e., p% < CCF

Q

On Thu, Jul 7, 2022 at 10:33 AM HushWay @.***> wrote:

Dear sir,

Your article said that "we create an equivalent pseudo-SSM with population frequency φi by adding an SSM to the dataset with total reads di and variant reads di × φi/2 rounded to the nearest whole number", but this really confused me because only cancer cell fractions rather than population frequency of heterozygous mutation are twice of VAF. If I did not make a mistake, your population frequency refers to cell fraction of all cells including normal cells. How to explain the formula?

Thanks for your attention, Liuwei

— Reply to this email directly, view it on GitHub https://github.com/morrislab/phylowgs/issues/137, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIZI5SZJKPVRRT5VYKI3NLVS3TE3ANCNFSM525UVUWA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Quaid Morris, PhD Member, Computational and Systems Biology, Memorial Sloan Kettering Cancer Center CCAI Chair, Vector Institute Faculty (on leave)

HushWay commented 2 years ago

Dear Quaid Morris,

Thanks for your patience and clear explanation, I misunderstood the concepts of CCF and population frequency.