szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
114 stars 33 forks source link

strange iHS values #107

Closed QianXiaobo closed 9 months ago

QianXiaobo commented 10 months ago

Hi Szpiech,

I used selscan --ihs --vcf $CHR.vcf.gz --pmap --threads 4 --unphased --out $OUT for autosomes. But I got very strange values of iHS as showed below:

rs6111385       76962   0.733333        2.76259 3.78571 -3.78571
rs2196239       80655   0.9     3.05153 2.50728 3.05153
rs34365171      82012   0.322222        3.44683 2.86232 3.44683
rs1858595       90008   0.277778        3.59731 2.63385 3.59731
rs1857092       94952   0.0888889       3.41047 2.91878 3.41047
rs6052070       96931   0.155556        3.3968  3.17345 3.3968
rs6515824       97122   0.0888889       3.41047 2.91857 3.41047
rs6052094       97394   0.155556        3.3968  3.17345 3.3968
rs6116135       98930   0.0777778       3.4056  2.81074 3.4056
rs6139361       101362  0.344444        3.35227 2.34399 3.35227
rs8182936       112877  0.3     3.56361 2.45486 3.56361
rs1342137       125121  0.133333        1.68481 2.98877 -2.98877
rs6133348       126529  0.533333        2.89004 2.61925 2.89004
rs6117403       126600  0.588889        2.67721 2.74363 -2.74363
rs6117404       126614  0.588889        2.67721 2.74363 -2.74363
rs6117416       126730  0.588889        2.67721 2.74363 -2.74363
rs6117417       126760  0.588889        2.67721 2.74363 -2.74363
rs73600341      126914  0.0666667       1.10486 2.61384 -2.61384
rs4815938       126923  0.655556        2.32151 3.09265 -3.09265
rs6085677       126930  0.655556        2.32151 3.09265 -3.09265
rs753217        127720  0.633333        2.29478 3.18024 -3.18024
rs16994392      127952  0.344444        3.07952 2.33642 3.07952
rs735669        128713  0.588889        2.4876  2.74966 -2.74966
rs78031171      128863  0.277778        3.31565 1.96099 3.31565
rs16994490      129063  0.344444        3.09187 2.27372 3.09187
rs12480202      129635  0.0666667       2.02509 2.4354  -2.4354
rs62190499      130133  0.288889        3.47126 2.17333 3.47126

The absolute value of iHS is equal to that of ihh0 or ihh1. I would like to know what's wrong with my result. Thanks

Bests, Xb

szpiech commented 10 months ago

Hi Xb,

Thanks for pointing this out, as I completely forgot to note in my update that columns 4 and 5 with --unphased set are actually reporting different information than when --unphased is not set.

If you look at the selscan 2 paper ( https://doi.org/10.1093/bioinformatics/btae006) on page 2, you can find the equation that defines unphased iHS. This is what is reported on column 6. With --unphased set, column 4 gives the value iHS_2 as defined in the paper, and column 5 gives iHS_0 as defined in the paper. These are, of course, different than the iHH_1 and iHH_0 values which are reported in the case where --unphased is not set.

Sorry for this confusion, I will need to update the README and manual to reflect this difference.

Zachary

On Thu, Jan 11, 2024 at 1:03 AM XB @.***> wrote:

Hi Szpiech,

I used selscan --ihs --vcf $CHR.vcf.gz --pmap --threads 4 --unphased --out $OUT for autosomes. But I got very strange values of iHS as showed below:

rs6111385 76962 0.733333 2.76259 3.78571 -3.78571 rs2196239 80655 0.9 3.05153 2.50728 3.05153 rs34365171 82012 0.322222 3.44683 2.86232 3.44683 rs1858595 90008 0.277778 3.59731 2.63385 3.59731 rs1857092 94952 0.0888889 3.41047 2.91878 3.41047 rs6052070 96931 0.155556 3.3968 3.17345 3.3968 rs6515824 97122 0.0888889 3.41047 2.91857 3.41047 rs6052094 97394 0.155556 3.3968 3.17345 3.3968 rs6116135 98930 0.0777778 3.4056 2.81074 3.4056 rs6139361 101362 0.344444 3.35227 2.34399 3.35227 rs8182936 112877 0.3 3.56361 2.45486 3.56361 rs1342137 125121 0.133333 1.68481 2.98877 -2.98877 rs6133348 126529 0.533333 2.89004 2.61925 2.89004 rs6117403 126600 0.588889 2.67721 2.74363 -2.74363 rs6117404 126614 0.588889 2.67721 2.74363 -2.74363 rs6117416 126730 0.588889 2.67721 2.74363 -2.74363 rs6117417 126760 0.588889 2.67721 2.74363 -2.74363 rs73600341 126914 0.0666667 1.10486 2.61384 -2.61384 rs4815938 126923 0.655556 2.32151 3.09265 -3.09265 rs6085677 126930 0.655556 2.32151 3.09265 -3.09265 rs753217 127720 0.633333 2.29478 3.18024 -3.18024 rs16994392 127952 0.344444 3.07952 2.33642 3.07952 rs735669 128713 0.588889 2.4876 2.74966 -2.74966 rs78031171 128863 0.277778 3.31565 1.96099 3.31565 rs16994490 129063 0.344444 3.09187 2.27372 3.09187 rs12480202 129635 0.0666667 2.02509 2.4354 -2.4354 rs62190499 130133 0.288889 3.47126 2.17333 3.47126

The absolute value of iHS is equal to that of ihh0 or ihh1. I would like to know what's wrong with my result. Thanks

Bests, Xb

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/107, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQW3IFX2G5VN64OY6N3YN56CFAVCNFSM6AAAAABBV5WM5SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA3TKOBSGEZDENY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

QianXiaobo commented 10 months ago

Hi Zachary,

Thanks a lot! By the way, I am confused that using ref/alt instead of ancestral/derived will affect the result or not?

Bests, Xb

szpiech commented 10 months ago

Hello,

So, the quick answer is that for the single population statistics (iHS/nSL) it matters a little bit, and for the two-population statistics (XPEHH/XPnSL) it doesn't matter much at all. There's a nice exploration of this effect in this paper: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0262024

-Zachary

On Thu, Jan 11, 2024 at 11:36 PM XB @.***> wrote:

Hi Zachary,

Thanks a lot! By the way, I am confused that using ref/alt instead of ancestral/derived will affect the result or not?

Bests, Xb

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/107#issuecomment-1888419357, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQVQ7VQPYU4O6RON6Y3YOC4UDAVCNFSM6AAAAABBV5WM5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGQYTSMZVG4 . You are receiving this because you commented.Message ID: @.***>

QianXiaobo commented 10 months ago

Hi Zachary,

Thanks for your reply again! I have tried to convert ref/alt to anc/der in my vcf file. I noticed that not all called sites in my vcf had ancestral allele information. So, those sites with no ancestral allele information should be dropped or not when calculating iHS for one population?

Best wishes, Xiaobo

szpiech commented 10 months ago

Hi Xiaobo,

Hmm, good question. I'm not sure how this choice would affect the statistic precisely, but I think either way would probably be fine.

-Zachary

On Thu, Jan 18, 2024 at 5:02 AM XB @.***> wrote:

Hi Zachary,

Thanks for your reply again! I have tried to convert ref/alt to anc/der in my vcf file. I noticed that not all called sites in my vcf had ancestral allele information. So, those sites with no ancestral allele information should be dropped or not when calculating iHS for one population?

Best wishes, Xiaobo

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/107#issuecomment-1898162353, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQSH3J5N6DWTTAWBBP3YPDXKDAVCNFSM6AAAAABBV5WM5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYGE3DEMZVGM . You are receiving this because you commented.Message ID: @.***>

QianXiaobo commented 9 months ago

Hi Zachary,

Thanks for your patience! I used VCF as input file with anc/der information as "AA=" in INFO column, just like this:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Phased genotypes">
...
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  2902026413
chr10   62082   rs61838557      G       T       .       .       AA=.;AC=55;AF=0.0878594;CM=0.00186052   GT      0|0
chr10   62540   .       C       T       .       .       AA=.;AC=20;AF=0.0319489;CM=0.00247311   GT      0|0
chr10   62554   rs577354278     T       C       .       .       AA=.;AC=2;AF=0.00319489;CM=0.00249183   GT      0|0
...
chr10   94856   .       C       A       .       .       AA=C;AC=15;AF=0.0239617;CM=0.0189228    GT      0|0
chr10   94870   rs11251919      A       G       .       .       AA=G;AC=113;AF=0.180511;CM=0.018925     GT      0|0
chr10   94872   rs11251920      A       C       .       .       AA=C;AC=113;AF=0.180511;CM=0.0189253    GT      0|0
...

It is a little difficult for me to read the script coding by cpp. I wonder that selscan used "AA" (means ancestral allele) in INFO column or not when calculating iHS. Or should I convert 0|0 to 1|1 by myself if ancestral allele isn't as reference allele?

Best wishes, Xiaobo

szpiech commented 9 months ago

Hi Xiaobo,

You will need to recode them yourself, unfortunately.

Zachary

Le lun. 22 janv. 2024 à 10:27 PM, XB @.***> a écrit :

Hi Zachary,

Thanks for your patient! I used VCF as input file with anc/der information as "AA=" in INFO column, just like this:

fileformat=VCFv4.2

FILTER=

FORMAT=

...

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 2902026413

chr10 62082 rs61838557 G T . . AA=.;AC=55;AF=0.0878594;CM=0.00186052 GT 0|0 chr10 62540 . C T . . AA=.;AC=20;AF=0.0319489;CM=0.00247311 GT 0|0 chr10 62554 rs577354278 T C . . AA=.;AC=2;AF=0.00319489;CM=0.00249183 GT 0|0 ... chr10 94856 . C A . . AA=C;AC=15;AF=0.0239617;CM=0.0189228 GT 0|0 chr10 94870 rs11251919 A G . . AA=G;AC=113;AF=0.180511;CM=0.018925 GT 0|0 chr10 94872 rs11251920 A C . . AA=C;AC=113;AF=0.180511;CM=0.0189253 GT 0|0 ...

It is a little difficult for me to read the script coding by cpp. I wonder that selscan used "AA" (means ancestral allele) in INFO column or not when calculating iHS. Or should I convert 0|0 to 1|1 by myself if ancestral allele isn't as reference allele?

Best wishes, Xiaobo

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/107#issuecomment-1905229883, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQSHRKBZ2MGEVA2FF73YP4U2JAVCNFSM6AAAAABBV5WM5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBVGIZDSOBYGM . You are receiving this because you commented.Message ID: @.***>