odelaneau / shapeit5

Segmented HAPlotype Estimation and Imputation Tool
https://odelaneau.github.io/shapeit5/
MIT License
56 stars 9 forks source link

rare phasing yields PP=0.5 at all singletons #56

Closed giorkala closed 9 months ago

giorkala commented 11 months ago

Hello,

Firstly, thanks for developing this software suite and also helping with any issues here. I'd like to report a potential issue I ran into when using newer versions of phase_rare (both the released and the one by Simone in issue #34), regarding the posterior probabilities that SHAPEIT reports.

In our cohort (WES for 39k individuals), phasing rare variants is completed without errors (including phasing common variants, as suggested), but when assessing the quality I found out that all variants with MAC=1 have PP=0.5 exactly. I then discussed this with collaborators working with UKBB who noticed the same behaviour when switching to the newest release. However, using SHAPEIT5 v1.0.0 (preprint version) on the same data yielded

bcftools view $BCF -C 1 -Ou | bcftools query -i'GT="het"' -f'[%CHROM:%POS:%REF:%ALT %GT %PP \n]'
chr21:10413627:C:G 0|1 0.717
chr21:10413631:C:T 1|0 1
chr21:10413638:T:A 0|1 1
chr21:10413701:G:A 1|0 0.5
chr21:10413717:A:G 1|0 0.956
chr21:10413728:C:T 0|1 0.71
chr21:10413765:G:A 1|0 0.576
chr21:10413781:G:A 1|0 0.798
chr21:10413809:G:C 0|1 1
chr21:10413818:A:G 0|1 0.733
chr21:10413819:G:A 1|0 0.686
chr21:10414945:C:T 1|0 0.822
chr21:10414947:A:T 1|0 0.98
…

thus, a reasonable distribution of PPs. Based on that, I tried to phase using older versions myself, but got the following errors:

Could you please look into this and check if we're missing something, or if there's a bug in the current version?

Thank you very much, Georgios

odelaneau commented 10 months ago

Hi George,

I've just replied to you by email and repost here in case it is useful to someone else.

Singelton phasing score will be back in the next software release (as soon as I find time to make it). In the meantime, you could recompile the code after:

  1. Un-comenting l132 in phase_rare/src/containers/genotype_set/genotype_set_phasing.cpp.
  2. Commenting l133 in the same file. This will bring back singleton phasing scores in the output.

Cheers,

giorkala commented 9 months ago

Hi,

Thanks a lot Olivier, this is to mention that the suggestion worked!

Just note that, for those who get the "invalid CONTIG id" error, you need to make this change in the contig branch of the project.

Best wishes, GK