kharchenkolab / numbat

Haplotype-aware CNV analysis from single-cell RNA-seq
https://kharchenkolab.github.io/numbat/
Other
164 stars 23 forks source link

Account for WGD #36

Open teng-gao opened 2 years ago

teng-gao commented 2 years ago

Ideally, we should account for WGD (baseline is 4 copies) when analyzing hyperploid tumors (e.g. TNBC5), which should lead to better model fit and prediction stability

DarioS commented 1 year ago

Isn't the baseline usually at three copies because of fitness? I have a cancer with 70% of samples with whole genome duplication and genome ploidy from PURPLE is rarely near four but often close to three, so I am looking forward to this enhancement!

image

Whereas chromosome losses are rarely tolerated in diploid cells, they occur frequently in tetraploid cells and can promote cancer formation.

teng-gao commented 1 year ago

Isn't the baseline usually at three copies because of fitness? I have a cancer with 70% of samples with whole genome duplication and genome ploidy from PURPLE is rarely near four but often close to three, so I am looking forward to this enhancement!

Right, although tumors that have ploidy near 3 should be ok unless no diploid regions are available anywhere on the genome. The issue only arises when the lowest (in copy number) balanced segments are 4 copies.

DarioS commented 6 months ago

I noticed that non-WGD samples have near-perfect agreement to whole genome sequencing (PURPLE) but not WGD. image It seems implausible that all of a sample's arms would be lost or gained rather than a mix. Sample IDs ending in Bulk are WGS.