odelaneau / GLIMPSE

Low Coverage Calling of Genotypes
MIT License
136 stars 26 forks source link

Questions about multiple samples imputation by GLIMPSE2 #219

Open raksasa opened 2 months ago

raksasa commented 2 months ago

Hi, After searching document and issues, I still not sure two things:

  1. Is that help to improve imputation accuracy if run GLIMPSE2_phase by multiple samples? I though the GLIMPSE1 algorithm do not benifit from multiple samples.
  2. Which is recommended for input? BAM or VCF (which GL pre-computed by bcftools), Is that affect the accuracy?

please help, thanks in advance~

srubinacci commented 1 month ago

Hi,

  1. GLIMPSE2 does reference-only imputation, while GLIMPSE1 also uses other target individuals in the conditioning set. So it's actually GLIMPSE2 that does not benefit from multiple samples. Pulling samples together is useful in GLIMPSE2 only for efficiency reasons.

  2. In general, pre-computing likelihoods is useful for higher coverages, as more complex models can be used. For coverages <1x it's likely that does not make much of a difference.

Hope this helps

Simone

raksasa commented 3 weeks ago

Hi,

  1. GLIMPSE2 does reference-only imputation, while GLIMPSE1 also uses other target individuals in the conditioning set. So it's actually GLIMPSE2 that does not benefit from multiple samples. Pulling samples together is useful in GLIMPSE2 only for efficiency reasons.
  2. In general, pre-computing likelihoods is useful for higher coverages, as more complex models can be used. For coverages <1x it's likely that does not make much of a difference.

Hope this helps

Simone

Thanks for the reply!

But in the first question, I found that the imputed genotypes were inconsistent after testing on 10 samples both independently and by pooling samples. Approximately 2.5% of the genotypes were inconsistent: around 30% of these were heterozygous variants with haplotype switches (0|1 ↔ 1|0), 40% changed between homozygous variants and heterozygous variants (0/1 ↔ 1/1), and the remaining 30% even changed between homozygous reference and variant. How to understand such difference?