yufengwudcs / GTmix

GTmix: Inference of population admixture network from local gene genealogies under coalescent theory
GNU General Public License v3.0
3 stars 1 forks source link

Input #2

Open AgustoLuz opened 4 years ago

AgustoLuz commented 4 years ago

Hi all, I think the software is interesting, but I am a little curious about how to create the input from ped/vcf files. Since you processed 1000 genomes data can you briefly describe me how did you create the input?

yufengwudcs commented 4 years ago

GTmix takes the inferred local genealogies output by RENT+. RENT+ uses a simple format like ms. To analyze data in vcf format (I am not very familiar with ped), you need to convert the haplotypes (you must have phased haplotypes) stored in vcf format to the simple format (the row line is SNP positions and then each row is a haplotype). Note that you may have very long haplotypes in a vcf file (like 1000 genomes data). What I did is choosing some regions of certain length (say 100kb) and extracting haplotypes form these regions. I then use RENT+ to infer gene genealogies from haplotypes from these regions separately. I then choose a smaller number of genealogies using the TreePicker utility (since there can be very large number of trees) and merge these chosen trees into one file. Then you can GTmix with this single file with all the chosen trees from all the loci.