szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
109 stars 33 forks source link

xpnsl norm result #68

Closed jiangzy26 closed 2 years ago

jiangzy26 commented 2 years ago

Hi, May I ask one question about the result of normalized xpnsl or xpehh, I found the result has max and min scores, which one should I use? Thank you.

szpiech commented 2 years ago

Hello, Extreme negative scores suggest long high-frequency haplotypes (and a possible sweep) in the reference population (i.e., the population you provided with --vcf-ref) and extreme positive scores suggest the same in the other population (--vcf). Which one to use depends on your underlying question. If you're particularly interested in one population or the other you would use the scores that correspond to that population (negative for the ref population, positive for the other) or you might be interested in sweeps in both populations, in which case you would consider both.

jiangzy26 commented 2 years ago
font{
    line-height: 1.6;
}
ul,ol{
    padding-left: 20px;
    list-style-position: inside;
}

    Hi Prof. Zachary A Szpiech,    Thank you for your reply, but when I use norm function in your software package, like in xpnsl, I will get max score and min score, they both have positive values and negative values, could you explain to me what they mean and which one should I use?    Thank you very muchAll the best,Zhiyong Jiang

On 1/10/2022 23:29,Zachary A ***@***.***> wrote: 

Hello, Extreme negative scores suggest long high-frequency haplotypes (and a possible sweep) in the reference population (i.e., the population you provided with --vcf-ref) and extreme positive scores suggest the same in the other population (--vcf). Which one to use depends on your underlying question. If you're particularly interested in one population or the other you would use the scores that correspond to that population (negative for the ref population, positive for the other) or you might be interested in sweeps in both populations, in which case you would consider both.

—Reply to this email directly, view it on GitHub, or unsubscribe.Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>

szpiech commented 2 years ago

If you are interested in putative sweeps in the first population (--vcf) then you should look at max-scores; If you are interested in putative sweeps in the second population (--vcf-ref) then you should look at the min-scores.

However, probably the best way to use the results from --bp-win in norm is to look at the estimated percentile for the window (in the direction you are interested in; I only really provide estimates for top 1, top 5, and top 100). For these 2-pop scans, as mentioned above, extreme negative and positive scores suggest possible sweeps in one or the other population, and looking for clusters of extreme scores provide stronger power to detect sweeps (I discuss this in Szpiech et al. 2021). What norm is doing here is, within each window (default 100kb), calculating the fraction of extreme scores > 2 and also the fraction of extreme scores < -2. It then estimates (separately for each direction) which windows are in the top 1%. It is these windows that you should be interested in, as they are the strongest candidates for sweeps in each population.

It might help to look over < https://onlinelibrary.wiley.com/doi/10.1002/evl3.232 >, as I discuss this there. Although in that case I was only interested in the first population (i.e., positive scores).

Hope this helps, and let me know if I can answer any more questions.