szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
111 stars 33 forks source link

norm xp-nSL #116

Open malteze2024 opened 4 months ago

malteze2024 commented 4 months ago

Hello! I am using the following code to normalize my xp-nSL data: for reg in $(seq 1 26) ; do norm --xpnsl --files xpnslRusAust$reg.xpnsl.out ; done

As a result I get this file: id | pos | gpos | p1 | sL1 | p2 | sL2 | xpnsl | normxpnsl | crit OAR6_5062320.1 | 5062320 | 26 | 0.273585 | 1.18328 | 0.40942 | 1.00629 | 0.070365 | 0.670019 | -1 OAR6_5348444_X.1 | 5348445 | 27 | 0.377358 | 1.23887 | 0.365942 | 1.12419 | 0.042185 | 0.224824 | 0 s63279.1 | 5521727 | 28 | 0.150943 | 1.68338 | 0.210145 | 1.65757 | 0.006709 | -0.33562 | 1 I don't fully understand the meaning of the crit pillar. -1 is the 1% negative values ​​on this chromosome by normxpehh normalized values, and 1 is the top 1% positive values?

Please tell me, to build a generalized Manhattan graph of normalized values, can I use the data from the normxpnsl column? Thank you so much for your work and for your recommendations! Best regards, Olesya

szpiech commented 3 months ago

Hello,

First, I would suggest normalizing all your chromosomes at once instead of individually, as you have indicated here with your for loop.

So, yes the xpehh header is a typo. If you used —xpnsl flag to compute your scores then this is what they are. Sorry about that!

The crit column ought to indicate -1 when the normalized score is <lowercrit value and 1 when it is >uppercrit value. This is not a particularly interesting column, but I use it for some computations. This crit value is +/-2 by default, so I’m unsure why it has this output in this case.

Yes you can use normxpehh column for your manhattan plot, people sometimes take these as z-scores and convert to p-values based on a standard normal.

Regarding the follow up question, I prefer to use the normalized score, but for the XP stats it isn’t strictly necessary to use them.

Zachary

Le ven. 7 juin 2024 à 07:30, Lesya @.***> a écrit :

Hello! I am using the following code to normalize my xp-nSL data: for reg in $(seq 1 26) ; do norm --xpnsl --files xpnslRusAust$reg.xpnsl.out ; done

As a result I get this file: id | pos | gpos | p1 | sL1 | p2 | sL2 | xpnsl | normxpehh | crit OAR6_5062320.1 | 5062320 | 26 | 0.273585 | 1.18328 | 0.40942 | 1.00629 | 0.070365 | 0.670019 | -1 OAR6_5348444_X.1 | 5348445 | 27 | 0.377358 | 1.23887 | 0.365942 | 1.12419 | 0.042185 | 0.224824 | 0 s63279.1 | 5521727 | 28 | 0.150943 | 1.68338 | 0.210145 | 1.65757 | 0.006709 | -0.33562 | 1

Firstly, it confuses me that in the column with normalized data the heading is normxpehh, and not normxnsl. Please tell me, is this a header error or is there something wrong with the data?

Secondly, I don't fully understand the meaning of the crit pillar. -1 is the 1% negative values ​​on this chromosome by normxpehh normalized values, and 1 is the top 1% positive values?

Please tell me, to build a generalized Manhattan graph of normalized values, can I use the data from the normxpehh column? Thank you so much for your work and for your recommendations! Best regards, Olesya

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/116, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQWEZWRSFQZKWW3YK73ZGGKVLAVCNFSM6AAAAABI6PTK5SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DAMRTHE2TSMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

malteze2024 commented 3 months ago

Thank you very much! The problem with the column header went away after I downloaded the updated version of the norm. Best regards, Olesya