szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
111 stars 33 forks source link

questions about xpehh results and ihs results #80

Open sadiexiaoyu opened 2 years ago

sadiexiaoyu commented 2 years ago

Hi, Szpiech,

I calculate the fraction of SNPs with |iHS| > 2 in 51 SNP windows in normalized iHS result and xpEHH result, respectively.

I found that there are around 30SNPs with |iHS| > 2 in 51 SNP window which occupied the top 1% windows in iHS result.

And there are 51SNPs with xpehh > 2 in 51 SNP window occupied top 2% windows in xpEHH result in the interested population, and 5SNPs with xpehh <- 2 in 51 SNP window occupied top 2% windows in xpEHH result in the reference population.

It means that the distribution of SNPs with positive selected signals actually skewed quite differently in these two populations used for xpEHH test.

Is this normal? I am wondering whether I miss something.

Looking forward to your suggestion!

szpiech commented 2 years ago

Hello,

So, with XPEHH the sign of the statistic is important. Positive scores suggest possible adaptation in the 'interested' population (that which you passed with --vcf, --tped, etc.), and negative scores suggest possible adaptation in the reference population (that which you passed with --vcf-ref, --tped-ref, etc.).

I'm not entirely understanding your question, but perhaps this helps? Please let me know.

sadiexiaoyu commented 2 years ago

Hello,

So, with XPEHH the sign of the statistic is important. Positive scores suggest possible adaptation in the 'interested' population (that which you passed with --vcf, --tped, etc.), and negative scores suggest possible adaptation in the reference population (that which you passed with --vcf-ref, --tped-ref, etc.).

I'm not entirely understanding your question, but perhaps this helps? Please let me know.

Thank you for the quick response! Yes, I used the script in the following: selscan --xpehh --hap target population --ref reference population --map target population.map --out output file Then norm the results as: norm --xpehh --files autosome chromosomes. Then I would like to find the SNP with signals of positive selection. I calculated the fraction of SNPs with xpehh > 2 (1 in the norm file) and xpehh<-2 (-1 in the norm file) in 51 SNP windows, respectively. Since positive selected SNP should be clustered together, I checked the numbers of SNPs in the top 2% window of these 51SNPs windows. I found that there are 51 SNPs with xpehh > 2 located in 51 SNP windows occupied top 2% windows in xpEHH result in the interested population, and 5 SNPs with xpehh <-2 located in 51 SNP window occupied top 2% windows in xpEHH result in the reference population. This means that genome regions of the target population possess more regions with clustered SNPs showing positive selected signal than the reference population, right?