Open brettva opened 1 year ago
Thanks for providing this example. I'm not seeing the same output as you. What operating system are you running on (and what operating system did you compile hds-util on)? I'd like to reproduce your environment.
@jonathonl Thank you so much for getting back to me so fast. I tried both compiling and running independently on the csg and armis clusters and seem to see the issue in both envs
I am not sure what details would be most helpful for you, but here are a few:
csg:
lsb_release -a
:
LSB Version: core-11.1.0ubuntu2-noarch:security-11.1.0ubuntu2-noarch
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal
gcc --version
:
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
armis:
lsb_release -a
:
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterprise
Description: Red Hat Enterprise Linux release 8.6 (Ootpa)
Release: 8.6
Codename: Ootpa
gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10)
Not sure if it matters but I believe in both cases the version of sav that was available at time of compiling was:
sav v2.1.0
If you need any other info please let me know
This should now be fixed with https://github.com/statgen/hds-util/commit/763bb2d62b53654b67d7e678d99ad309abe43d0f. Please rebuild with latest from master branch.
@jonathonl I really appreciate the time you put into fixing that, especially so fast. It looks better on my end now.
Another quick question, and sorry if I am missing it somewhere. We certainty want MAF and Rsq recomputed in our merged data, but what is the point of recomputing DS, GT , GP from HDS?
Is it just so that these numbers can be recapitulated from the HDS that appears in the VCF? Regardless is it always recommended to update DS, GT , GP with -f DS, GT , GP
when merging, iiuc this issue at least Rsq is originally based off more precise HDS than what is seen in the VCF.
It's recomputed for the sake of simpler code. There is a plan for future versions of the imputation server to only export HDS in the output files in order to reduce compute and storage costs. Most people don't need all four FORMAT fields, so hds-util allows you to generate only the fields needed by a user for downstream analysis.
In the latest version of Minimac4, the Rsq is computed after the precision loss. But Imputation Server is still using the older version so that issue would still apply. In any case, the median difference is quite small and I suspect it would have negligible effects on Rsq filtering strategies.
@jonathonl That makes a lot of sense thanks
Thank you for developing this tool, it will be quite handy for us.
In my merges I have been getting invalid genotypes (eg
0/-44
) in addition a mixture of phased and unphased sites.I imputed some publicly available HGDP samples on MIS to demonstrate this issue here
Do you advice on how to proceed? Hopefully I am not doing something silly. Thanks