large-scale-gxe-methods / GEM

Other
10 stars 5 forks source link

Issue with MAF when including a list of variants #13

Closed lemieuxl closed 2 years ago

lemieuxl commented 2 years ago

Awesome software!

We found an issue when, at the same time, we include a list of variants (--include-snp-file) and we filter according to MAF (--maf 0.01, for example) (genotypes are in BGEN format).

When reading the BGEN file, as soon as a variant gets filtered out because of MAF, no more variants are processed. Hence, if the fourth variants (from a list of 300k) has a lower MAF then the threshold, there are only three variants in the results file.

The following condition will always be True because of the second part (i.e. keepVariants[keepIndex] + 1 != snploop). https://github.com/large-scale-gxe-methods/GEM/blob/fc773b60f7bf60eee4d15ae8959e08d641392555/src/ReadBGEN.cpp#L927

We think that the fix could be to increment the keepIndex counter in the following block. https://github.com/large-scale-gxe-methods/GEM/blob/fc773b60f7bf60eee4d15ae8959e08d641392555/src/ReadBGEN.cpp#L1112

I could do a pull request if you want, but I'm unsure if other counters should also be incremented (e.g. stream_i).

pancong419 commented 2 years ago

Thank you for pointing out the issues. We have fixed it and pushed the changes to the dev branch. And we will release a new version soon. Please let us know if there are any issues.

pancong419 commented 2 years ago

Version 1.4.4 has been released.