szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
109 stars 33 forks source link

.hap file of selscan input #33

Closed Lee211 closed 5 years ago

Lee211 commented 5 years ago

each variants consists of two or more alleles, but .hap file require 0 or 1 for each variant. eg. A and G allele of a SNP, how do i code for this variant (AA, AG, GG) in the .hap file?

szpiech commented 5 years ago

The .hap file format encodes one biallelic locus per column and one haplotype per row. If you have a diploid individual this means two haplotypes and two rows per person. You then must encode each allele 0 or 1 as you wish. Common choices are 1 for derived and 0 for ancestral, or 1 for alternate and 0 for reference.

Lee211 commented 5 years ago

i get .hap file from bim/bed.fam using SHAPEIT2. in this ,hap file, the row means a variant and two columns code a person using 0 and 1. if i transform this file to two rows represent a person and each column code a variant. Is it ok for selscan? need i think about the mean of 0 or 1, eg. derived or ancestral. is it different between 0 code for ancestral allele and derived allele?

szpiech commented 5 years ago

I believe if you transpose this file .hap file from SHAPEIT2 it should work. You can code your variants any way you prefer, selscan just requires 0/1 coding to run. If you don't have ancestral/derived information, you can assign 0/1 arbitrarily.