szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
111 stars 33 forks source link

confused about the normalization results of xp-ehh and ihs #98

Open abcdefghijklmn97 opened 1 year ago

abcdefghijklmn97 commented 1 year ago

Hi szpiech

I am confused about the normalization results after using selscan to calculate and normalize xp-ehh and ihs. I was going to refer to an article that describes "XP-EHH and iHS were implemented using the program Selscan ( v1.1.0). Results were normalized with a 20 kb window. The ratio of extreme scores (|score| > = 2) in each window were calculated. The top 1% of windows (with the highest ratio of extreme scores) were considered to be candidate selective regions." for taking the top 1% region windows.

My normalization commands are /SOFT/Selscan/selscan-2.0.0/bin/linux/norm --xpehh --files line.xpehh.out --bp-win --winsize 20000; /SOFT/Selscan/selscan-2.0.0 /bin/linux/norm --ihs --files line.ihs.out --bp-win --winsize 20000

The first point of confusion, for xp-ehh after norm the result Chr01.xpehh.out.norm.20kb.windows file has nine columns representing 2> < lt the fraction of XP-EHH scores < -2>

I want to take top1 as above isn't it extracting 2> <lt the fraction of XP-EHH scores < -2> these two columns for sorting and then choose the top 1% of the window with the highest percentage as the candidate selection area?

The second point is what does the 2> <lt the fraction of XP-EHH scores < -2> in the Chr01.xpehh.out.norm.20kb.windows file mean when the value is -1?

The third point is that after I normalize the ihs to get six columns of data, I'm not quite sure if the table header should be <frac of |iHS| > threshold> ?

The fourth point is that I want to take the ihs of the top1% window as described above is it extracting the top1% of the <frac of |iHS| > threshold> column?

Finally I would like to ask, do you think it is reliable for me to find the candidate windows as described in the above article? Or do you have a better suggestion?

Thank you. Best wishes!

Jinhua Long

szpiech commented 1 year ago

Hello,

I will try to answer to the best I understand your questions. First, for XP-EHH, since the sign of the statistic is meaningful, there are two ways to search for windows with enriched scores: those windows enriched for positive scores and those windows enriched for negative scores. The XP-EHH normalization .windows file has the following column meanings:

<# scores in win> The 4th and 6th columns represent the fraction of scores greater than 2 and the approximate percentile of that fraction. If you are looking for sweeps in the population that you gave with --vcf (or --tped or --hap) you would pick out the windows with a "1" in the 6th column, these are the windows in the top 1%. If you are interested in possible sweeps in the other population (e.g., passed with --vcf-ref etc.), you would take the windows with "1" in the 7th column. If there are -1's this means there were not enough scores in the window and that window was excluded from the analysis. For the iHS analysis you would use the 5th column, which is the approximate percentile for enriched scores. I often use 100kb windows, but presumably the paper you quote from had their reasons for choosing 20kb. The approach seems fine to me. -Zachary On Mon, Jun 12, 2023 at 10:39 PM abcdefghijklmn97 ***@***.***> wrote: > Hi szpiech > > I am confused about the normalization results after using selscan to > calculate and normalize xp-ehh and ihs. I was going to refer to an article > that describes "XP-EHH and iHS were implemented using the program Selscan ( > v1.1.0). Results were normalized with a 20 kb window. The ratio of extreme > scores (|score| > = 2) in each window were calculated. The top 1% of > windows (with the highest ratio of extreme scores) were considered to be > candidate selective regions." for taking the top 1% region windows. > > My normalization commands are /SOFT/Selscan/selscan-2.0.0/bin/linux/norm > --xpehh --files line.xpehh.out --bp-win --winsize 20000; > /SOFT/Selscan/selscan-2.0.0 /bin/linux/norm --ihs --files line.ihs.out > --bp-win --winsize 20000 > > The first point of confusion, for xp-ehh after norm the result > Chr01.xpehh.out.norm.20kb.windows file has nine columns representing 2> < > lt the fraction of XP-EHH scores < -2> > > I want to take top1 as above isn't it extracting 2> XP-EHH scores < -2> these two columns for sorting and then choose the top > 1% of the window with the highest percentage as the candidate selection > area? > > The second point is what does the 2> -2> in the Chr01.xpehh.out.norm.20kb.windows file mean when the value is -1? > > The third point is that after I normalize the ihs to get six columns of > data, I'm not quite sure if the table header should be > threshold> ? > > The fourth point is that I want to take the ihs of the top1% window as > described above is it extracting the top1% of the > threshold> column? > > Finally I would like to ask, do you think it is reliable for me to find > the candidate windows as described in the above article? Or do you have a > better suggestion? > > Thank you. > Best wishes! > > Translated with www.DeepL.com/Translator (free version) > > — > Reply to this email directly, view it on GitHub > , or unsubscribe > > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >
abcdefghijklmn97 commented 1 year ago

Hi

Thank you for your timely reply. Please forgive my poor English and "Translated with www.DeepL.com/Translator (free version)". You have already answered all my confusions. Thank you very much.

Best wishes

Jinhua Long

407832543 @.***

 

------------------ 原始邮件 ------------------ 发件人: "szpiech/selscan" @.>; 发送时间: 2023年6月13日(星期二) 晚上10:23 @.>; @.**@.>; 主题: Re: [szpiech/selscan] confused about the normalization results of xp-ehh and ihs (Issue #98)

Hello,

I will try to answer to the best I understand your questions. First, for XP-EHH, since the sign of the statistic is meaningful, there are two ways to search for windows with enriched scores: those windows enriched for positive scores and those windows enriched for negative scores. The XP-EHH normalization .windows file has the following column meanings:

<win start> <win end> <# scores in win> <frac scores gt threshold> <frac scores lt threshold> <approx percentile for gt threshold wins> <approx percentile for lt threshold wins> <max score> <min score>

The 4th and 6th columns represent the fraction of scores greater than 2 and the approximate percentile of that fraction. If you are looking for sweeps in the population that you gave with --vcf (or --tped or --hap) you would pick out the windows with a "1" in the 6th column, these are the windows in the top 1%. If you are interested in possible sweeps in the other population (e.g., passed with --vcf-ref etc.), you would take the windows with "1" in the 7th column. If there are -1's this means there were not enough scores in the window and that window was excluded from the analysis. For the iHS analysis you would use the 5th column, which is the approximate percentile for enriched scores.

I often use 100kb windows, but presumably the paper you quote from had their reasons for choosing 20kb. The approach seems fine to me.

-Zachary

On Mon, Jun 12, 2023 at 10:39 PM abcdefghijklmn97 @.***> wrote:

> Hi szpiech > > I am confused about the normalization results after using selscan to > calculate and normalize xp-ehh and ihs. I was going to refer to an article > that describes "XP-EHH and iHS were implemented using the program Selscan ( > v1.1.0). Results were normalized with a 20 kb window. The ratio of extreme > scores (|score| > = 2) in each window were calculated. The top 1% of > windows (with the highest ratio of extreme scores) were considered to be > candidate selective regions." for taking the top 1% region windows. > > My normalization commands are /SOFT/Selscan/selscan-2.0.0/bin/linux/norm > --xpehh --files line.xpehh.out --bp-win --winsize 20000; > /SOFT/Selscan/selscan-2.0.0 /bin/linux/norm --ihs --files line.ihs.out > --bp-win --winsize 20000 > > The first point of confusion, for xp-ehh after norm the result > Chr01.xpehh.out.norm.20kb.windows file has nine columns representing 2> < > lt the fraction of XP-EHH scores < -2> > > I want to take top1 as above isn't it extracting 2> <lt the fraction of > XP-EHH scores < -2> these two columns for sorting and then choose the top > 1% of the window with the highest percentage as the candidate selection > area? > > The second point is what does the 2> <lt the fraction of XP-EHH scores < > -2> in the Chr01.xpehh.out.norm.20kb.windows file mean when the value is -1? > > The third point is that after I normalize the ihs to get six columns of > data, I'm not quite sure if the table header should be <frac of |iHS| > > threshold> ? > > The fourth point is that I want to take the ihs of the top1% window as > described above is it extracting the top1% of the <frac of |iHS| > > threshold> column? > > Finally I would like to ask, do you think it is reliable for me to find > the candidate windows as described in the above article? Or do you have a > better suggestion? > > Thank you. > Best wishes! > > Translated with www.DeepL.com/Translator (free version) > > — > Reply to this email directly, view it on GitHub > <https://github.com/szpiech/selscan/issues/98&gt;, or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABAKRQV6KN2YZFSGAMXAIBDXK7HEPANCNFSM6AAAAAAZEF6WZE&gt; > . > You are receiving this because you are subscribed to this thread.Message > ID: @.***> >

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

abcdefghijklmn97 commented 1 year ago

Hi,

Thank you for your careful reply in your last email, I'm very sorry, after re-normalizing my data this morning based on your last email, I found that there is still a confusion. I have reviewed and made sense of the results, the normalization command I used is

 

/SOFT/Selscan/selscan-2.0.0/bin/linux/norm --xpehh --files $input/$line.xpehh.out --bp-win --winsize 100000.

 

/SOFT/Selscan/selscan-2.0.0/bin/linux/norm --ihs --files $input/$line.ihs.out --bp-win --winsize 100000

 

Your email says "If you are looking for sweeps in the population that you gave with --vcf (or --tped or --hap) you would pick out the windows with a "1" in the 6th column, these are the windows in the top 1%." and "For the iHS analysis you would use the 5th column, which is the approximate percentile for enriched scores."

 

This indicates the 6th column of my xpehh.out.norm.100kb.windows file must should be “1”, not “100”.

Or rather, I want to confirm if it's normal for the 6th column of my original file to be 100?

(because I thought <approx percentile for gt threshold wins> “approx percentile“ 100% is 1)

 

Similarly, I have the same doubt about the interpretation of the 5th column in the ihs.out.100bins.norm.100kb.windows file.

 

I uploaded two files, the two files in the attachment correspond to the results after sorting using the following two commands, respectively

cat Chr01.xpehh.out.norm.100kb.windows | sort -nr -k 6 > sort.Chr01.xpehh.out.norm.100kb.windows

cat Chr01.ihs.out.100bins.norm.100kb.windows | sort -nr -k 5 > sort.Chr01.ihs.out.100bins.norm.100kb.windows

 

Finally, would you please help me to check the two attached files to determine if my normalization results and understanding are correct?

Thank you

Jinhua Long

407832543 @.***

 

------------------ 原始邮件 ------------------ 发件人: "szpiech/selscan" @.>; 发送时间: 2023年6月13日(星期二) 晚上10:23 @.>; @.**@.>; 主题: Re: [szpiech/selscan] confused about the normalization results of xp-ehh and ihs (Issue #98)

Hello,

I will try to answer to the best I understand your questions. First, for XP-EHH, since the sign of the statistic is meaningful, there are two ways to search for windows with enriched scores: those windows enriched for positive scores and those windows enriched for negative scores. The XP-EHH normalization .windows file has the following column meanings:

<win start> <win end> <# scores in win> <frac scores gt threshold> <frac scores lt threshold> <approx percentile for gt threshold wins> <approx percentile for lt threshold wins> <max score> <min score>

The 4th and 6th columns represent the fraction of scores greater than 2 and the approximate percentile of that fraction. If you are looking for sweeps in the population that you gave with --vcf (or --tped or --hap) you would pick out the windows with a "1" in the 6th column, these are the windows in the top 1%. If you are interested in possible sweeps in the other population (e.g., passed with --vcf-ref etc.), you would take the windows with "1" in the 7th column. If there are -1's this means there were not enough scores in the window and that window was excluded from the analysis. For the iHS analysis you would use the 5th column, which is the approximate percentile for enriched scores.

I often use 100kb windows, but presumably the paper you quote from had their reasons for choosing 20kb. The approach seems fine to me.

-Zachary

On Mon, Jun 12, 2023 at 10:39 PM abcdefghijklmn97 @.***> wrote:

> Hi szpiech > > I am confused about the normalization results after using selscan to > calculate and normalize xp-ehh and ihs. I was going to refer to an article > that describes "XP-EHH and iHS were implemented using the program Selscan ( > v1.1.0). Results were normalized with a 20 kb window. The ratio of extreme > scores (|score| > = 2) in each window were calculated. The top 1% of > windows (with the highest ratio of extreme scores) were considered to be > candidate selective regions." for taking the top 1% region windows. > > My normalization commands are /SOFT/Selscan/selscan-2.0.0/bin/linux/norm > --xpehh --files line.xpehh.out --bp-win --winsize 20000; > /SOFT/Selscan/selscan-2.0.0 /bin/linux/norm --ihs --files line.ihs.out > --bp-win --winsize 20000 > > The first point of confusion, for xp-ehh after norm the result > Chr01.xpehh.out.norm.20kb.windows file has nine columns representing 2> < > lt the fraction of XP-EHH scores < -2> > > I want to take top1 as above isn't it extracting 2> <lt the fraction of > XP-EHH scores < -2> these two columns for sorting and then choose the top > 1% of the window with the highest percentage as the candidate selection > area? > > The second point is what does the 2> <lt the fraction of XP-EHH scores < > -2> in the Chr01.xpehh.out.norm.20kb.windows file mean when the value is -1? > > The third point is that after I normalize the ihs to get six columns of > data, I'm not quite sure if the table header should be <frac of |iHS| > > threshold> ? > > The fourth point is that I want to take the ihs of the top1% window as > described above is it extracting the top1% of the <frac of |iHS| > > threshold> column? > > Finally I would like to ask, do you think it is reliable for me to find > the candidate windows as described in the above article? Or do you have a > better suggestion? > > Thank you. > Best wishes! > > Translated with www.DeepL.com/Translator (free version) > > — > Reply to this email directly, view it on GitHub > <https://github.com/szpiech/selscan/issues/98&gt;, or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABAKRQV6KN2YZFSGAMXAIBDXK7HEPANCNFSM6AAAAAAZEF6WZE&gt; > . > You are receiving this because you are subscribed to this thread.Message > ID: @.***> >

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

szpiech commented 1 year ago

Hello,

Most of the lines will have 100 in them, but the top 1% windows will be labeled with a 1.

-Zachary

On Tue, Jun 13, 2023 at 10:53 PM abcdefghijklmn97 @.***> wrote:

Hi,

Thank you for your careful reply in your last email, I'm very sorry, after re-normalizing my data this morning based on your last email, I found that there is still a confusion. I have reviewed and made sense of the results, the normalization command I used is

 

/SOFT/Selscan/selscan-2.0.0/bin/linux/norm --xpehh --files $input/$line.xpehh.out --bp-win --winsize 100000.

 

/SOFT/Selscan/selscan-2.0.0/bin/linux/norm --ihs --files $input/$line.ihs.out --bp-win --winsize 100000

 

Your email says "If you are looking for sweeps in the population that you gave with --vcf (or --tped or --hap) you would pick out the windows with a "1" in the 6th column, these are the windows in the top 1%." and "For the iHS analysis you would use the 5th column, which is the approximate percentile for enriched scores."

 

This indicates the 6th column of my xpehh.out.norm.100kb.windows file must should be “1”, not “100”.

Or rather, I want to confirm if it's normal for the 6th column of my original file to be 100?

(because I thought <approx percentile for gt threshold wins> “approx percentile“ 100% is 1)

 

Similarly, I have the same doubt about the interpretation of the 5th column in the ihs.out.100bins.norm.100kb.windows file.

 

I uploaded two files, the two files in the attachment correspond to the results after sorting using the following two commands, respectively

cat Chr01.xpehh.out.norm.100kb.windows | sort -nr -k 6 > sort.Chr01.xpehh.out.norm.100kb.windows

cat Chr01.ihs.out.100bins.norm.100kb.windows | sort -nr -k 5 > sort.Chr01.ihs.out.100bins.norm.100kb.windows

 

Finally, would you please help me to check the two attached files to determine if my normalization results and understanding are correct?

Thank you

Jinhua Long

407832543 @.***

 

------------------ 原始邮件 ------------------ 发件人: "szpiech/selscan" @.>; 发送时间: 2023年6月13日(星期二) 晚上10:23 @.>; @.**@.>; 主题: Re: [szpiech/selscan] confused about the normalization results of xp-ehh and ihs (Issue #98)

Hello,

I will try to answer to the best I understand your questions. First, for XP-EHH, since the sign of the statistic is meaningful, there are two ways to search for windows with enriched scores: those windows enriched for positive scores and those windows enriched for negative scores. The XP-EHH normalization .windows file has the following column meanings:

<win start> <win end> <# scores in win> <frac scores gt threshold> <frac scores lt threshold> <approx percentile for gt threshold wins> <approx percentile for lt threshold wins> <max score> <min score>

The 4th and 6th columns represent the fraction of scores greater than 2 and the approximate percentile of that fraction. If you are looking for sweeps in the population that you gave with --vcf (or --tped or --hap) you would pick out the windows with a "1" in the 6th column, these are the windows in the top 1%. If you are interested in possible sweeps in the other population (e.g., passed with --vcf-ref etc.), you would take the windows with "1" in the 7th column. If there are -1's this means there were not enough scores in the window and that window was excluded from the analysis. For the iHS analysis you would use the 5th column, which is the approximate percentile for enriched scores.

I often use 100kb windows, but presumably the paper you quote from had their reasons for choosing 20kb. The approach seems fine to me.

-Zachary

On Mon, Jun 12, 2023 at 10:39 PM abcdefghijklmn97 @.***> wrote:

> Hi szpiech > > I am confused about the normalization results after using selscan to > calculate and normalize xp-ehh and ihs. I was going to refer to an article > that describes "XP-EHH and iHS were implemented using the program Selscan ( > v1.1.0). Results were normalized with a 20 kb window. The ratio of extreme > scores (|score| > = 2) in each window were calculated. The top 1% of > windows (with the highest ratio of extreme scores) were considered to be > candidate selective regions." for taking the top 1% region windows. > > My normalization commands are /SOFT/Selscan/selscan-2.0.0/bin/linux/norm > --xpehh --files line.xpehh.out --bp-win --winsize 20000; > /SOFT/Selscan/selscan-2.0.0 /bin/linux/norm --ihs --files line.ihs.out > --bp-win --winsize 20000 > > The first point of confusion, for xp-ehh after norm the result > Chr01.xpehh.out.norm.20kb.windows file has nine columns representing 2> < > lt the fraction of XP-EHH scores < -2> > > I want to take top1 as above isn't it extracting 2> <lt the fraction of > XP-EHH scores < -2> these two columns for sorting and then choose the top > 1% of the window with the highest percentage as the candidate selection > area? > > The second point is what does the 2> <lt the fraction of XP-EHH scores < > -2> in the Chr01.xpehh.out.norm.20kb.windows file mean when the value is -1? > > The third point is that after I normalize the ihs to get six columns of > data, I'm not quite sure if the table header should be <frac of |iHS| > > threshold> ? > > The fourth point is that I want to take the ihs of the top1% window as > described above is it extracting the top1% of the <frac of |iHS| > > threshold> column? > > Finally I would like to ask, do you think it is reliable for me to find > the candidate windows as described in the above article? Or do you have a > better suggestion? > > Thank you. > Best wishes! > > Translated with www.DeepL.com/Translator (free version) > > — > Reply to this email directly, view it on GitHub > <https://github.com/szpiech/selscan/issues/98&gt;, or unsubscribe > < https://github.com/notifications/unsubscribe-auth/ABAKRQV6KN2YZFSGAMXAIBDXK7HEPANCNFSM6AAAAAAZEF6WZE&gt;

> . > You are receiving this because you are subscribed to this thread.Message > ID: @.***> >

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/98#issuecomment-1590366372, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQXFCJ2PRG45K4G6UJDXLERTVANCNFSM6AAAAAAZEF6WZE . You are receiving this because you commented.Message ID: @.***>

abcdefghijklmn97 commented 1 year ago

Hi,

Thank you for your reply.

Jinhua Long

407832543 @.***

 

------------------ 原始邮件 ------------------ 发件人: "szpiech/selscan" @.>; 发送时间: 2023年6月17日(星期六) 晚上7:31 @.>; @.**@.>; 主题: Re: [szpiech/selscan] confused about the normalization results of xp-ehh and ihs (Issue #98)

Hello,

Most of the lines will have 100 in them, but the top 1% windows will be labeled with a 1.

-Zachary

On Tue, Jun 13, 2023 at 10:53 PM abcdefghijklmn97 @.***> wrote:

> Hi, > > Thank you for your careful reply in your last email, I'm very sorry, after > re-normalizing my data this morning based on your last email, I found that > there is still a confusion. I have reviewed and made sense of the results, > the normalization command I used is > > &nbsp; > > /SOFT/Selscan/selscan-2.0.0/bin/linux/norm --xpehh --files > $input/$line.xpehh.out --bp-win --winsize 100000. > > &nbsp; > > /SOFT/Selscan/selscan-2.0.0/bin/linux/norm --ihs --files > $input/$line.ihs.out --bp-win --winsize 100000 > > &nbsp; > > Your email says "If you are looking for sweeps in the population that you > gave with --vcf (or --tped or --hap) you would pick out the windows with a > "1" in the 6th column, these are the windows in the top 1%." and "For the > iHS analysis you would use the 5th column, which is the approximate > percentile for enriched scores." > > &nbsp; > > This indicates the 6th column of my xpehh.out.norm.100kb.windows file must > should be “1”, not “100”. > > Or rather, I want to confirm if it's normal for the 6th column of my > original file to be 100? > > (because I thought <approx percentile for gt threshold wins&gt; “approx > percentile“ 100% is 1) > > &nbsp; > > Similarly, I have the same doubt about the interpretation of the 5th > column in the ihs.out.100bins.norm.100kb.windows file. > > &nbsp; > > I uploaded two files, the two files in the attachment correspond to the > results after sorting using the following two commands, respectively > > cat Chr01.xpehh.out.norm.100kb.windows | sort -nr -k 6 &gt; > sort.Chr01.xpehh.out.norm.100kb.windows > > cat Chr01.ihs.out.100bins.norm.100kb.windows | sort -nr -k 5 &gt; > sort.Chr01.ihs.out.100bins.norm.100kb.windows > > &nbsp; > > Finally, would you please help me to check the two attached files to > determine if my normalization results and understanding are correct? > > > > > Thank you > > Jinhua Long > > > > > > 407832543 > @. > > > > &nbsp; > > > > > ------------------&nbsp;原始邮件&nbsp;------------------ > 发件人: "szpiech/selscan" @.&gt;; > 发送时间:&nbsp;2023年6月13日(星期二) 晚上10:23 > @.&gt;; > @*.**@*.&gt;; > 主题:&nbsp;Re: [szpiech/selscan] confused about the normalization results of > xp-ehh and ihs (Issue #98) > > > > > > Hello, > > I will try to answer to the best I understand your questions. First, for > XP-EHH, since the sign of the statistic is meaningful, there are two ways > to search for windows with enriched scores: those windows enriched for > positive scores and those windows enriched for negative scores. The XP-EHH > normalization .windows file has the following column meanings: > > <win start&gt; <win end&gt; <# scores in win&gt; <frac scores gt > threshold&gt; > <frac scores lt threshold&gt; <approx percentile for gt threshold wins&gt; > <approx percentile for lt threshold wins&gt; <max score&gt; <min score&gt; > > The 4th and 6th columns represent the fraction of scores greater than > 2 and the approximate percentile of that fraction. If you are looking > for sweeps in the population that you gave with --vcf (or --tped or > --hap) you would pick out the windows with a "1" in the 6th column, > these are the windows in the top 1%. If you are interested in possible > sweeps in the other population (e.g., passed with --vcf-ref etc.), you > would take the windows with "1" in the 7th column. If there are -1's > this means there were not enough scores in the window and that window > was excluded from the analysis. For the iHS analysis you would use the > 5th column, which is the approximate percentile for enriched scores. > > I often use 100kb windows, but presumably the paper you quote from had > their reasons for choosing 20kb. The approach seems fine to me. > > -Zachary > > > On Mon, Jun 12, 2023 at 10:39 PM abcdefghijklmn97 @.&gt; > wrote: > > &gt; Hi szpiech > &gt; > &gt; I am confused about the normalization results after using selscan to > &gt; calculate and normalize xp-ehh and ihs. I was going to refer to an > article > &gt; that describes "XP-EHH and iHS were implemented using the program > Selscan ( > &gt; v1.1.0). Results were normalized with a 20 kb window. The ratio of > extreme > &gt; scores (|score| &gt; = 2) in each window were calculated. The top 1% > of > &gt; windows (with the highest ratio of extreme scores) were considered to > be > &gt; candidate selective regions." for taking the top 1% region windows. > &gt; > &gt; My normalization commands are > /SOFT/Selscan/selscan-2.0.0/bin/linux/norm > &gt; --xpehh --files line.xpehh.out --bp-win --winsize 20000; > &gt; /SOFT/Selscan/selscan-2.0.0 /bin/linux/norm --ihs --files > line.ihs.out > &gt; --bp-win --winsize 20000 > &gt; > &gt; The first point of confusion, for xp-ehh after norm the result > &gt; Chr01.xpehh.out.norm.20kb.windows file has nine columns representing > 2&gt; < > &gt; lt the fraction of XP-EHH scores < -2&gt; > &gt; > &gt; I want to take top1 as above isn't it extracting 2&gt; <lt the > fraction of > &gt; XP-EHH scores < -2&gt; these two columns for sorting and then choose > the top > &gt; 1% of the window with the highest percentage as the candidate > selection > &gt; area? > &gt; > &gt; The second point is what does the 2&gt; <lt the fraction of XP-EHH > scores < > &gt; -2&gt; in the Chr01.xpehh.out.norm.20kb.windows file mean when the > value is -1? > &gt; > &gt; The third point is that after I normalize the ihs to get six columns > of > &gt; data, I'm not quite sure if the table header should be <frac of |iHS| > &gt; > &gt; threshold&gt; ? > &gt; > &gt; The fourth point is that I want to take the ihs of the top1% window > as > &gt; described above is it extracting the top1% of the <frac of |iHS| &gt; > &gt; threshold&gt; column? > &gt; > &gt; Finally I would like to ask, do you think it is reliable for me to > find > &gt; the candidate windows as described in the above article? Or do you > have a > &gt; better suggestion? > &gt; > &gt; Thank you. > &gt; Best wishes! > &gt; > &gt; Translated with www.DeepL.com/Translator (free version) > &gt; > &gt; — > &gt; Reply to this email directly, view it on GitHub > &gt; <https://github.com/szpiech/selscan/issues/98&amp;gt;, or unsubscribe > &gt; < > https://github.com/notifications/unsubscribe-auth/ABAKRQV6KN2YZFSGAMXAIBDXK7HEPANCNFSM6AAAAAAZEF6WZE&amp;gt; > > &gt; . > &gt; You are receiving this because you are subscribed to this > thread.Message > &gt; ID: @.&gt; > &gt; > > — > Reply to this email directly, view it on GitHub, or unsubscribe. > You are receiving this because you authored the thread.Message ID: > @.&gt; > > — > Reply to this email directly, view it on GitHub > <https://github.com/szpiech/selscan/issues/98#issuecomment-1590366372&gt;, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABAKRQXFCJ2PRG45K4G6UJDXLERTVANCNFSM6AAAAAAZEF6WZE&gt; > . > You are receiving this because you commented.Message ID: > @.***> >

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>