tomszar / HGDP_1000G_Merge

Merging the HGDP and 1000 Genomes reference samples
https://tomszar.github.io/HGDP_1000G_Merge/
3 stars 3 forks source link

HGDP SNPs extracted from 16 chromosomes in 1000G as shown in notebook #2

Open la-mendicino opened 2 years ago

la-mendicino commented 2 years ago

Hey Tom,

First, your pipeline is great, thank you! I've learned a lot from it as I do not come from a bioinformatics background.

I have a question about the step where the HGDP SNPs are extracted from 1000G chromosomes. Why in the jupyter notebook does it only show 16 chromosomes that had SNPs extracted, omitting 4, 5, 6, 7, 8, and 9? Then when you concatenate the chromosome files, it again combines 16 chromosomes. Any insight would be greatly appreciated. Thank you!

Lucas

tomszar commented 2 years ago

Hi Lucas, I think I did that to see whether the notebook was working but not have to run all chromosomes and save time. It should work if you run the thing on your end and use all chromosomes. However, 1000G dropped the use of RSID, and I think I did a workaround, but I'm not entirely sure if it works completely fine because I haven't had time to test it thoroughly.