Closed QianXiaobo closed 9 months ago
Hi Xb,
Thanks for pointing this out, as I completely forgot to note in my update that columns 4 and 5 with --unphased set are actually reporting different information than when --unphased is not set.
If you look at the selscan 2 paper ( https://doi.org/10.1093/bioinformatics/btae006) on page 2, you can find the equation that defines unphased iHS. This is what is reported on column 6. With --unphased set, column 4 gives the value iHS_2 as defined in the paper, and column 5 gives iHS_0 as defined in the paper. These are, of course, different than the iHH_1 and iHH_0 values which are reported in the case where --unphased is not set.
Sorry for this confusion, I will need to update the README and manual to reflect this difference.
Zachary
On Thu, Jan 11, 2024 at 1:03 AM XB @.***> wrote:
Hi Szpiech,
I used selscan --ihs --vcf $CHR.vcf.gz --pmap --threads 4 --unphased --out $OUT for autosomes. But I got very strange values of iHS as showed below:
rs6111385 76962 0.733333 2.76259 3.78571 -3.78571 rs2196239 80655 0.9 3.05153 2.50728 3.05153 rs34365171 82012 0.322222 3.44683 2.86232 3.44683 rs1858595 90008 0.277778 3.59731 2.63385 3.59731 rs1857092 94952 0.0888889 3.41047 2.91878 3.41047 rs6052070 96931 0.155556 3.3968 3.17345 3.3968 rs6515824 97122 0.0888889 3.41047 2.91857 3.41047 rs6052094 97394 0.155556 3.3968 3.17345 3.3968 rs6116135 98930 0.0777778 3.4056 2.81074 3.4056 rs6139361 101362 0.344444 3.35227 2.34399 3.35227 rs8182936 112877 0.3 3.56361 2.45486 3.56361 rs1342137 125121 0.133333 1.68481 2.98877 -2.98877 rs6133348 126529 0.533333 2.89004 2.61925 2.89004 rs6117403 126600 0.588889 2.67721 2.74363 -2.74363 rs6117404 126614 0.588889 2.67721 2.74363 -2.74363 rs6117416 126730 0.588889 2.67721 2.74363 -2.74363 rs6117417 126760 0.588889 2.67721 2.74363 -2.74363 rs73600341 126914 0.0666667 1.10486 2.61384 -2.61384 rs4815938 126923 0.655556 2.32151 3.09265 -3.09265 rs6085677 126930 0.655556 2.32151 3.09265 -3.09265 rs753217 127720 0.633333 2.29478 3.18024 -3.18024 rs16994392 127952 0.344444 3.07952 2.33642 3.07952 rs735669 128713 0.588889 2.4876 2.74966 -2.74966 rs78031171 128863 0.277778 3.31565 1.96099 3.31565 rs16994490 129063 0.344444 3.09187 2.27372 3.09187 rs12480202 129635 0.0666667 2.02509 2.4354 -2.4354 rs62190499 130133 0.288889 3.47126 2.17333 3.47126
The absolute value of iHS is equal to that of ihh0 or ihh1. I would like to know what's wrong with my result. Thanks
Bests, Xb
— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/107, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQW3IFX2G5VN64OY6N3YN56CFAVCNFSM6AAAAABBV5WM5SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA3TKOBSGEZDENY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Zachary,
Thanks a lot! By the way, I am confused that using ref/alt instead of ancestral/derived will affect the result or not?
Bests, Xb
Hello,
So, the quick answer is that for the single population statistics (iHS/nSL) it matters a little bit, and for the two-population statistics (XPEHH/XPnSL) it doesn't matter much at all. There's a nice exploration of this effect in this paper: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0262024
-Zachary
On Thu, Jan 11, 2024 at 11:36 PM XB @.***> wrote:
Hi Zachary,
Thanks a lot! By the way, I am confused that using ref/alt instead of ancestral/derived will affect the result or not?
Bests, Xb
— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/107#issuecomment-1888419357, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQVQ7VQPYU4O6RON6Y3YOC4UDAVCNFSM6AAAAABBV5WM5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGQYTSMZVG4 . You are receiving this because you commented.Message ID: @.***>
Hi Zachary,
Thanks for your reply again! I have tried to convert ref/alt to anc/der in my vcf file. I noticed that not all called sites in my vcf had ancestral allele information. So, those sites with no ancestral allele information should be dropped or not when calculating iHS for one population?
Best wishes, Xiaobo
Hi Xiaobo,
Hmm, good question. I'm not sure how this choice would affect the statistic precisely, but I think either way would probably be fine.
-Zachary
On Thu, Jan 18, 2024 at 5:02 AM XB @.***> wrote:
Hi Zachary,
Thanks for your reply again! I have tried to convert ref/alt to anc/der in my vcf file. I noticed that not all called sites in my vcf had ancestral allele information. So, those sites with no ancestral allele information should be dropped or not when calculating iHS for one population?
Best wishes, Xiaobo
— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/107#issuecomment-1898162353, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQSH3J5N6DWTTAWBBP3YPDXKDAVCNFSM6AAAAABBV5WM5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYGE3DEMZVGM . You are receiving this because you commented.Message ID: @.***>
Hi Zachary,
Thanks for your patience! I used VCF as input file with anc/der information as "AA=" in INFO column, just like this:
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Phased genotypes">
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 2902026413
chr10 62082 rs61838557 G T . . AA=.;AC=55;AF=0.0878594;CM=0.00186052 GT 0|0
chr10 62540 . C T . . AA=.;AC=20;AF=0.0319489;CM=0.00247311 GT 0|0
chr10 62554 rs577354278 T C . . AA=.;AC=2;AF=0.00319489;CM=0.00249183 GT 0|0
...
chr10 94856 . C A . . AA=C;AC=15;AF=0.0239617;CM=0.0189228 GT 0|0
chr10 94870 rs11251919 A G . . AA=G;AC=113;AF=0.180511;CM=0.018925 GT 0|0
chr10 94872 rs11251920 A C . . AA=C;AC=113;AF=0.180511;CM=0.0189253 GT 0|0
...
It is a little difficult for me to read the script coding by cpp. I wonder that selscan used "AA" (means ancestral allele) in INFO column or not when calculating iHS. Or should I convert 0|0
to 1|1
by myself if ancestral allele isn't as reference allele?
Best wishes, Xiaobo
Hi Xiaobo,
You will need to recode them yourself, unfortunately.
Zachary
Le lun. 22 janv. 2024 à 10:27 PM, XB @.***> a écrit :
Hi Zachary,
Thanks for your patient! I used VCF as input file with anc/der information as "AA=" in INFO column, just like this:
fileformat=VCFv4.2
FILTER=
FORMAT=
...
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 2902026413
chr10 62082 rs61838557 G T . . AA=.;AC=55;AF=0.0878594;CM=0.00186052 GT 0|0 chr10 62540 . C T . . AA=.;AC=20;AF=0.0319489;CM=0.00247311 GT 0|0 chr10 62554 rs577354278 T C . . AA=.;AC=2;AF=0.00319489;CM=0.00249183 GT 0|0 ... chr10 94856 . C A . . AA=C;AC=15;AF=0.0239617;CM=0.0189228 GT 0|0 chr10 94870 rs11251919 A G . . AA=G;AC=113;AF=0.180511;CM=0.018925 GT 0|0 chr10 94872 rs11251920 A C . . AA=C;AC=113;AF=0.180511;CM=0.0189253 GT 0|0 ...
It is a little difficult for me to read the script coding by cpp. I wonder that selscan used "AA" (means ancestral allele) in INFO column or not when calculating iHS. Or should I convert 0|0 to 1|1 by myself if ancestral allele isn't as reference allele?
Best wishes, Xiaobo
— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/107#issuecomment-1905229883, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQSHRKBZ2MGEVA2FF73YP4U2JAVCNFSM6AAAAABBV5WM5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBVGIZDSOBYGM . You are receiving this because you commented.Message ID: @.***>
Hi Szpiech,
I used
selscan --ihs --vcf $CHR.vcf.gz --pmap --threads 4 --unphased --out $OUT
for autosomes. But I got very strange values of iHS as showed below:The absolute value of iHS is equal to that of ihh0 or ihh1. I would like to know what's wrong with my result. Thanks
Bests, Xb