Purity Interpretation - Githubissues

Recently, a couple of algorithms CopyKAT and SCEVAN for analysing single-cell RNA sequencing data have identified that some tumours are composed of two or three main clones.

In one TNBC sample (TNBC1), the clustering of 797 aneuploid copy number profiles identified two major subclones (A, B) that comprised 44% and 28% of the tumor mass and were separated by two distinct lineages in neighbor-joining (NJ) tree. Clustered heat maps identified clonal amplifications (1q, 6p, 8q, 10p, 16p and 18p) and clonal deletions (1p, 4q, 5q, 8p, 10q, 13 and 14) that were shared across all tumor cells. The clustered heat maps of the consensus copy number profiles revealed subclonal CNA events, including subclonal amplifications in clone A (4p, 7q, 9p13.2–q22.2 and 17q) and subclonal amplifications in clone B (3p26.3–p25.1, 6q, 7p, 11q, Xp11.23 and Xq) that varied in the tumor mass.

How does that correspond to PURPLE's estimate of tumour purity? Would it be 44% + 28% = 72% (i.e. cancer cell fraction)? Or would it be reported as 44% (i.e. largest pure tumour group)? It might be useful to provide some clarifying sentences in the user guide.

hartwigmedical / hmftools

Purity Interpretation #396