szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
111 stars 33 forks source link

Standardized scores do not appear for every SNP when the norm is used. #110

Open spchavan10 opened 5 months ago

spchavan10 commented 5 months ago

Hello there, I'm an Animal Genetics student currently working on Bovine50kSNP genotype data. Why, after using norm for the .ihs.out file, its not giving standardized iHS scores for all the SNPs in the file? Suppose in the chr1.ihs.out file there are 1894 SNPs, but the chr1.ihs.out.100bins shows scores for only 1560 SNPs.

szpiech commented 5 months ago

Hello,

This could potentially happen if there are <20 sites within a given frequency bin. Normalization within a given frequency bin occurs only if there are >=20 scores in that bin. If this is the case for you, you can try reducing the number of frequency bins you are using.

-Zachary

On Mon, Apr 29, 2024 at 1:07 AM spchavan10 @.***> wrote:

Hello there, I'm an Animal Genetics student currently working on Bovine50kSNP genotype data. Why, after using norm for the .ihs.out file, its not giving standardized iHS scores for all the SNPs in the file? Suppose in the chr1.ihs.out file there are 1894 SNPs, but the chr1.ihs.out.100bins shows scores for only 1560 SNPs.

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/110, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQTZ7CQNU5GG4CGX4S3Y7XIP3AVCNFSM6AAAAABG5UG6EOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3DQMJTGIZDAOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

spchavan10 commented 5 months ago

Thank you for the insights, but even if I reduce the frequency bins, the result stays exactly the same. For the calculation of iHS in Selscan, I used flags like --max-gap, --gap-scale, --pmap, --maf, --trunc-ok, and --cutoff to retrieve the maximum number of SNPs in the unstandardized output file. Actually, I want to incorporate these scores into DCMS by converting them into p-values. That is why I'm trying to get all the SNPs in the output file.

spchavan10 commented 5 months ago

Where can I get the manual for the NORM function?

szpiech commented 5 months ago

Hello,

Unfortunately, I think the only documentation I’ve written so far for norm is in the changelog and in —help. This will need to change.

Can you send your norm log file? I'll try to trouble shoot, but I have limited time before I go on leave at the end of the week.

Zachary

Le mar. 30 avr. 2024 à 5:41 AM, Shambhuraditya Chavan < @.***> a écrit :

Where can I get the manual for the NORM function?

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/110#issuecomment-2084845891, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQXUZL3OGEGYUTIW5MLY75ROLAVCNFSM6AAAAABG5UG6EOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBUHA2DKOBZGE . You are receiving this because you commented.Message ID: @.***>

spchavan10 commented 5 months ago

./norm --ihs --files chromo1.ihs.out norm v1.3.0 You have provided 1 output files for joint normalization. Opened chromo1.ihs.out

Total loci: 1894 Reading all data. Calculating mean and variance per frequency bin:

bin num mean variance 0.01 0 -nan -nan 0.02 201 -0.942456 0.0331806 0.03 0 -nan -nan 0.04 153 0.202392 0.116415 0.05 0 -nan -nan 0.06 120 0.151216 0.0700112 0.07 0 -nan -nan 0.08 107 0.169515 0.0389571 0.09 0 -nan -nan 0.1 113 0.228969 0.0594271 0.11 0 -nan -nan 0.12 88 0.197439 0.0371951 0.13 77 0.155727 0.0376023 0.14 0 -nan -nan 0.15 62 0.121447 0.0296197 0.16 0 -nan -nan 0.17 58 0.108081 0.0347496 0.18 0 -nan -nan 0.19 49 0.153233 0.0225802 0.2 0 -nan -nan 0.21 62 0.0996912 0.0190187 0.22 0 -nan -nan 0.23 43 0.108986 0.0256443 0.24 0 -nan -nan 0.25 41 0.085957 0.0129141 0.26 31 0.0990113 0.0119576 0.27 0 -nan -nan 0.28 25 0.0350351 0.0224791 0.29 0 -nan -nan 0.3 17 0.0756617 0.0147399 0.31 0 -nan -nan 0.32 25 0.0648255 0.0219701 0.33 0 -nan -nan 0.34 17 0.0989446 0.0155861 0.35 0 -nan -nan 0.36 20 0.0618536 0.0244242 0.37 0 -nan -nan 0.38 16 0.0100937 0.0110296 0.39 11 0.034661 0.0255826 0.4 0 -nan -nan 0.41 20 0.0504288 0.0104644 0.42 0 -nan -nan 0.43 17 0.0261363 0.0178654 0.44 0 -nan -nan 0.45 16 -0.104928 0.0097222 0.46 0 -nan -nan 0.47 11 -0.0363621 0.0161006 0.48 0 -nan -nan 0.49 20 -0.011081 0.0213475 0.5 0 -nan -nan 0.51 13 -0.00641709 0.0121569 0.52 17 0.0351954 0.0321013 0.53 0 -nan -nan 0.54 24 0.0141019 0.0218387 0.55 0 -nan -nan 0.56 35 -0.0504989 0.0306169 0.57 0 -nan -nan 0.58 25 -0.041707 0.0193195 0.59 0 -nan -nan 0.6 11 -0.0530253 0.019896 0.61 0 -nan -nan 0.62 18 -0.0725664 0.0128914 0.63 14 -0.0723302 0.0110694 0.64 0 -nan -nan 0.65 16 -0.0353695 0.0145087 0.66 0 -nan -nan 0.67 25 -0.0646973 0.0264331 0.68 0 -nan -nan 0.69 16 -0.0806662 0.0161506 0.7 0 -nan -nan 0.71 23 -0.0510331 0.017164 0.72 0 -nan -nan 0.73 17 -0.110549 0.0167697 0.74 0 -nan -nan 0.75 21 -0.111374 0.0296077 0.76 20 -0.111592 0.0239563 0.77 0 -nan -nan 0.78 18 -0.226333 0.0147349 0.79 0 -nan -nan 0.8 21 -0.141272 0.0764032 0.81 0 -nan -nan 0.82 20 -0.166597 0.0491221 0.83 0 -nan -nan 0.84 15 -0.197985 0.06345 0.85 0 -nan -nan 0.86 13 -0.140425 0.030714 0.87 0 -nan -nan 0.88 30 -0.239987 0.029849 0.89 12 -0.186248 0.027718 0.9 0 -nan -nan 0.91 19 -0.183821 0.0668138 0.92 0 -nan -nan 0.93 13 -0.140662 0.0534552 0.94 0 -nan -nan 0.95 10 -0.238264 0.0849062 0.96 0 -nan -nan 0.97 3 -0.186794 0.0100847 0.98 0 -nan -nan 0.99 5 0.902356 0.0553932 1 0 -nan -nan Normalizing chromo1.ihs.out

spchavan10 commented 5 months ago

I'll mail map, vcf, ihs and norm output files.

spchavan10 commented 5 months ago

As mentioned earlier, I'll be using these scores for DCMS estimation. That's why I'm trying to retrieve all the SNPs.

On Tue, 30 Apr 2024 at 16:34, Zachary A Szpiech @.***> wrote:

Hello,

Unfortunately, I think the only documentation I’ve written so far for norm is in the changelog and in —help. This will need to change.

Can you send your norm log file? I'll try to trouble shoot, but I have limited time before I go on leave at the end of the week.

Zachary

Le mar. 30 avr. 2024 à 5:41 AM, Shambhuraditya Chavan < @.***> a écrit :

Where can I get the manual for the NORM function?

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/110#issuecomment-2084845891,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABAKRQXUZL3OGEGYUTIW5MLY75ROLAVCNFSM6AAAAABG5UG6EOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBUHA2DKOBZGE>

. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/110#issuecomment-2085003182, or unsubscribe https://github.com/notifications/unsubscribe-auth/BIEYXFL6WTHY77CPYGVP4RTY753EBAVCNFSM6AAAAABG5UG6EOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBVGAYDGMJYGI . You are receiving this because you authored the thread.Message ID: @.***>

szpiech commented 5 months ago

Hello,

Ok, this shows me that there are many frequency classes with < 20 sites (second column of that freq/mean/var table), but you are only normalizing one chromosome? Recommended usage would be to provide all chromosomes at once for joint normalization. Eg —files chromo*.ihs.out

Zachary

Le mar. 30 avr. 2024 à 7:28 AM, Shambhuraditya Chavan < @.***> a écrit :

./norm --ihs --files chromo1.ihs.out norm v1.3.0 You have provided 1 output files for joint normalization. Opened chromo1.ihs.out

Total loci: 1894 Reading all data. Calculating mean and variance per frequency bin:

bin num mean variance 0.01 0 -nan -nan 0.02 201 -0.942456 0.0331806 0.03 0 -nan -nan 0.04 153 0.202392 0.116415 0.05 0 -nan -nan 0.06 120 0.151216 0.0700112 0.07 0 -nan -nan 0.08 107 0.169515 0.0389571 0.09 0 -nan -nan 0.1 113 0.228969 0.0594271 0.11 0 -nan -nan 0.12 88 0.197439 0.0371951 0.13 77 0.155727 0.0376023 0.14 0 -nan -nan 0.15 62 0.121447 0.0296197 0.16 0 -nan -nan 0.17 58 0.108081 0.0347496 0.18 0 -nan -nan 0.19 49 0.153233 0.0225802 0.2 0 -nan -nan 0.21 62 0.0996912 0.0190187 0.22 0 -nan -nan 0.23 43 0.108986 0.0256443 0.24 0 -nan -nan 0.25 41 0.085957 0.0129141 0.26 31 0.0990113 0.0119576 0.27 0 -nan -nan 0.28 25 0.0350351 0.0224791 0.29 0 -nan -nan 0.3 17 0.0756617 0.0147399 0.31 0 -nan -nan 0.32 25 0.0648255 0.0219701 0.33 0 -nan -nan 0.34 17 0.0989446 0.0155861 0.35 0 -nan -nan 0.36 20 0.0618536 0.0244242 0.37 0 -nan -nan 0.38 16 0.0100937 0.0110296 0.39 11 0.034661 0.0255826 0.4 0 -nan -nan 0.41 20 0.0504288 0.0104644 0.42 0 -nan -nan 0.43 17 0.0261363 0.0178654 0.44 0 -nan -nan 0.45 16 -0.104928 0.0097222 0.46 0 -nan -nan 0.47 11 -0.0363621 0.0161006 0.48 0 -nan -nan 0.49 20 -0.011081 0.0213475 0.5 0 -nan -nan 0.51 13 -0.00641709 0.0121569 0.52 17 0.0351954 0.0321013 0.53 0 -nan -nan 0.54 24 0.0141019 0.0218387 0.55 0 -nan -nan 0.56 35 -0.0504989 0.0306169 0.57 0 -nan -nan 0.58 25 -0.041707 0.0193195 0.59 0 -nan -nan 0.6 11 -0.0530253 0.019896 0.61 0 -nan -nan 0.62 18 -0.0725664 0.0128914 0.63 14 -0.0723302 0.0110694 0.64 0 -nan -nan 0.65 16 -0.0353695 0.0145087 0.66 0 -nan -nan 0.67 25 -0.0646973 0.0264331 0.68 0 -nan -nan 0.69 16 -0.0806662 0.0161506 0.7 0 -nan -nan 0.71 23 -0.0510331 0.017164 0.72 0 -nan -nan 0.73 17 -0.110549 0.0167697 0.74 0 -nan -nan 0.75 21 -0.111374 0.0296077 0.76 20 -0.111592 0.0239563 0.77 0 -nan -nan 0.78 18 -0.226333 0.0147349 0.79 0 -nan -nan 0.8 21 -0.141272 0.0764032 0.81 0 -nan -nan 0.82 20 -0.166597 0.0491221 0.83 0 -nan -nan 0.84 15 -0.197985 0.06345 0.85 0 -nan -nan 0.86 13 -0.140425 0.030714 0.87 0 -nan -nan 0.88 30 -0.239987 0.029849 0.89 12 -0.186248 0.027718 0.9 0 -nan -nan 0.91 19 -0.183821 0.0668138 0.92 0 -nan -nan 0.93 13 -0.140662 0.0534552 0.94 0 -nan -nan 0.95 10 -0.238264 0.0849062 0.96 0 -nan -nan 0.97 3 -0.186794 0.0100847 0.98 0 -nan -nan 0.99 5 0.902356 0.0553932 1 0 -nan -nan Normalizing chromo1.ihs.out

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/110#issuecomment-2085043141, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQVRFGZ4OTCH5XGYG6DY7554ZAVCNFSM6AAAAABG5UG6EOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBVGA2DGMJUGE . You are receiving this because you commented.Message ID: @.***>

spchavan10 commented 5 months ago

Oh, okay, I'll do it. Earlier, I tried using the loop command for this, but the results stayed the same.