Closed aaronmcdaid closed 7 years ago
To recap more clearly:
1) fixed a bug on Wednesday that resulted in slightly too many rows of output (and in a non-deterministic order) when multiple SNPs had the same position.
2) --impute.range, as in the document
3) --impute.snps, as in the document
4) the output file now has imputation quality too, and the two alleles, and a header line so you can tell what the columns are
5) I created a more "realistic", but small, input files (gwas/UKB.parental.lifespan.every100thSNP.txt
and ref/TWINSUK.every100thRS.chrm123.100people.vcf.gz
) by taking one 100 people and using only rs numbers ending in 00
. This covers all the chromosomes and gives plenty of windows to test with. I should now test that making small adjustments to the window sizes make only small changes to the output
Update Friday night: I screwed up a bit in github. Merged too much stuff directly into your repository, not as a merge request. Anyway, it's not a problem, just doesn't seem very tidy for me. It includes a bugfix from Wednesday, but also now support for --impute.range and --impute.snp and other random stuff.
Original message, about the bugfix on Wednesday, follows: I think I've fixed that issue about the ordering of SNPs that I mentioned by email a few minutes ago. In fact, I discovered an unrelated bug when I looked more closely at it.
I should have been printing one output per target. But instead - for each distinct chr:pos - I was printing an imputation for each SNPname at that chr:pos. It was too complicated, and wrong. Fixing that bug should also make the output more deterministic.
You can have multiple SNPs at the same chr:pos, for example including indels. (Actually, I guess I shouldn't refer to indels as "SNPs"). I think I'm now handling those correctly, or at least more correctly than before.