statgen / hds-util

Mozilla Public License 2.0
2 stars 0 forks source link

Different site lists in empirical dose files from MIS #3

Open dtaliun opened 4 months ago

dtaliun commented 4 months ago

Hi Jonathon,

When merging the dose files from MIS, the imputed site lists are guaranteed to be the same as long as no R2 filters are applied because MIS outputs even quasi-monomorphic variants. However, empirical dose files may be different. When we split large GWASs into two batches, we sometimes end up with a few typed variants which are monomorphic in one batch but not in another. MIS eliminates monomorphic typed variants and doesn't output them in the empirical dose files of one batch but not another. Thus, we ended up with different site lists and merging errors.

Would you happen to have any suggestions on how to overcome this (without redoing the imputation)? Can the empirical dose files still be merged? Given that these are typically just a few variants, do you think downstream MetaMinimac will complain if we remove them from the empirical dose files?

Thanks, Daniel

jonathonl commented 4 months ago

MetaMinimac wouldn't know the difference if you manually removed such variants before merging. I think that's the only option at the moment. It's unfortunate that MIS drops monomorphic variants (I'm not sure why they would do that). Thanks for raising this issue. I'll mention this to the MIS team, but I may end up modifying hds-util to automatically drop such variants when merging.

dtaliun commented 4 months ago

Thanks Jonathon,

I will leave here a quick bcftools command to remove such variants manually if someone else reads this issue:

bcftools isec -n=2 batch1.empiricalDose.vcf.gz batch2.empiricalDose.vcf.gz -p temp
hds-util -Ovcf.gz -o merged.empiricalDose.vcf.gz temp/0000.vcf temp/0001.vcf
rm -rf temp