odelaneau / GLIMPSE

Low Coverage Calling of Genotypes
MIT License
142 stars 28 forks source link

Genotyping tumor samples that have subclonal loss of heterozygousity #51

Closed teng-gao closed 3 years ago

teng-gao commented 3 years ago

Hi authors,

I am applying your method in cancer sequencing samples, where large regions of of the chromosome may be affected by sub-clonal loss of heterozygosity (LOH). This results in allelic imbalance of heterozygous SNPs, whose allelic ratio (AR=AD/DP) deviates from the expected ratio (AR=0.5 in the below plot). Glimpse (maybe the bcftools genotyping step) seems to mistake heterozygous SNPs for homozygous when the AR deviates too much from 0.5 (for example, in the <0.2 and >0.8 range, as shown by the purple and orange dots below). Is there a way to avoid this issue, for example by tuning parameters for the genotype likelihood step?

image

srubinacci commented 3 years ago

Hi, sorry for the delay in the answer. This is a very interesting question. Indeed genotype callers can be confused by this setting as they might confuse allelic imbalance with genotyping errors (imputation should alleviate this problem, but imputation of HETs is always challenging). Unfortunately I think that proper modelling would be needed in this situation to account this properly, but I have not knowledge of methods that can easily take this into account. What is the coverage of your samples?

teng-gao commented 3 years ago

Thanks for the reply. The coverage of my samples are about 20x, but ideally the genotype confidence should reflect different coverage depths. The error model used by BCFtools seems to be too conservative. Is it possible to supply our own genotype probabilities (e.g. 0.8 het, 0.2 homozygous) to Glimpse?

srubinacci commented 3 years ago

It would be possible, yes. In that case you would need to provide an input VCF file with your genotype likelihoods in the FORMAT/PL field (remember they are Normalized Phred-scaled) or in the FORMAT/GL (and use the --inputGL option in glimpse phase (v1.1.1))

mcieslik-mctp commented 3 years ago

From reading the GLIMPSE code I was under the impression that GLIMPSE discards GTs and only uses GLs, if that is truly the case although the GT call may be '0/0' the GL difference between 0/1 and 0/0 should be low given the non-negligible amount of evidence for the ALT (or REF for 1/1).

srubinacci commented 3 years ago

From reading the GLIMPSE code I was under the impression that GLIMPSE discards GTs and only uses GLs

This is correct. The GT field is not read in the input target file. Only FORMAT/PL is used (or FORMAT/GL using the --inputGL option)