luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
299 stars 37 forks source link

Detecting heterozygous germline variants having undergone loss of heterozygosity in tumour samples #219

Open TBradley27 opened 2 years ago

TBradley27 commented 2 years ago

Describe the bug Hello

From what I can see, Octopus in tumour-only calling mode sometimes does not detect heterozygous germline variants which have undergone loss of heterozygosity during tumourigenesis resulting in complete absence of the reference/wild-type allele in tumour cells

I think this is because the genotyping assumes the stability of the copy number of germline haplotypes, leading to removal of these variants either due to the AF or the AFB filter

This is linked to issue #173 reported earlier

Version

$ octopus --version
octopus version 0.7.2 (728bdb81)
Target: x86_64 Linux 3.10.0-1127.18.2.el7.x86_64
SIMD extension: AVX2
Compiler: GNU 9.3.0
Boost: 1_75

Command

Command line to run octopus: See #173

Additional context See #173

dancooke commented 2 years ago

Thanks for the issue report. The genotyping model can account for LOH - it does indeed assume diploid germline background, but the haplotype mixture frequencies are continuous and can tend to zero.

Filtering is a separate issue, and it's true that germline variants in LOH regions may be incorrectly flagged by the AF or AFB filter, and will probably have poorly calibrated quality scores from the random forest model. The main issue is that LOH is not explicitly called - Octopus does not call CNVs in the cancer model - so any variants not marked SOMATIC are filtered in the same way as normal variants (i.e. with --forest-model if using random forest filtering, or --filter-expression if using threshold filtering). I agree this is problematic, and maybe it makes sense to have separate filtering criteria for tumour germline variants - I'll have a think about this.