zkutalik / ssimp_software

GNU General Public License v3.0
16 stars 10 forks source link

loads of stuff! that bugfix from Wednesday about ordering. --impute.range, --impute.snps, and a decent header line on the output #8

Closed aaronmcdaid closed 7 years ago

aaronmcdaid commented 7 years ago

Update Friday night: I screwed up a bit in github. Merged too much stuff directly into your repository, not as a merge request. Anyway, it's not a problem, just doesn't seem very tidy for me. It includes a bugfix from Wednesday, but also now support for --impute.range and --impute.snp and other random stuff.

Original message, about the bugfix on Wednesday, follows: I think I've fixed that issue about the ordering of SNPs that I mentioned by email a few minutes ago. In fact, I discovered an unrelated bug when I looked more closely at it.

I should have been printing one output per target. But instead - for each distinct chr:pos - I was printing an imputation for each SNPname at that chr:pos. It was too complicated, and wrong. Fixing that bug should also make the output more deterministic.

You can have multiple SNPs at the same chr:pos, for example including indels. (Actually, I guess I shouldn't refer to indels as "SNPs"). I think I'm now handling those correctly, or at least more correctly than before.

aaronmcdaid commented 7 years ago

To recap more clearly:

1) fixed a bug on Wednesday that resulted in slightly too many rows of output (and in a non-deterministic order) when multiple SNPs had the same position. 2) --impute.range, as in the document 3) --impute.snps, as in the document 4) the output file now has imputation quality too, and the two alleles, and a header line so you can tell what the columns are 5) I created a more "realistic", but small, input files (gwas/UKB.parental.lifespan.every100thSNP.txt and ref/TWINSUK.every100thRS.chrm123.100people.vcf.gz) by taking one 100 people and using only rs numbers ending in 00. This covers all the chromosomes and gives plenty of windows to test with. I should now test that making small adjustments to the window sizes make only small changes to the output