szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
107 stars 33 forks source link

how to set the window size and step size for SNP files #111

Open hjt1129 opened 2 months ago

hjt1129 commented 2 months ago

hi, we have SNP data, how to calculate xpehh value within specific window size and step size. i found the "--pi", but it seem not work when set it with following comand: selscan --xpehh --vcf vcf.file --vcf-ref vcf_ref.file --map map.file --threads 8 --pi --pi-win 30000 --out out.file

szpiech commented 2 months ago

Hello,

XPEHH isn’t really computed in windows but snp by snp. If you only wanted to compute these scores for a subset of the genome, I suppose you’d have to subset your VCF before giving it to selscan. Is this what you were asking?

Zachary

Le ven. 10 mai 2024 à 07:36, hjt1129 @.***> a écrit :

hi, we have SNP data, how to calculate xpehh value within specific window size and step size. i found the "--pi", but it seem not work when set it with following comand: selscan --xpehh --vcf vcf.file --vcf-ref vcf_ref.file --map map.file --threads 8 --pi --pi-win 30000 --out out.file

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/111, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQT3PKY6EJI4PPAHQGDZBSWNHAVCNFSM6AAAAABHQPHLCSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4DSNJWGMYDOOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

hjt1129 commented 2 months ago

hi, Thanks for your reply. No, i just want to calculate the whole genome VCF in windows, like that calculate Fst using VCFTOOLS in windows and steps. So it seems unachievable for XPEHH as you said. One more question, i see the norm script which has the function to normlize xpehh values with windows by "--bp-win" parameter, what's the underlying roles for normalizaton? and the columums with "gt the fraction of XP-EHH scores >2" and "lt the fraction of XP-EHH scores < -2" means the frequency of XP-EHH scores >2 and frequency of XP-EHH scores < -2, respectively?

szpiech commented 2 months ago

Hello,

Ok so the —bp-wins parameter will convert the raw scores into z-scores and then look for enrichments of extreme positive or extreme negative scores within windows, this is typically how I identify regions putatively under selection. You can read about this in Szpiech et al 2021 Evo Lett. This paper is on XPNSL but it works the same for XPEHH.

Zachary

Le ven. 10 mai 2024 à 22:04, hjt1129 @.***> a écrit :

hi, Thanks for your reply. No, i just want to calculate the whole genome VCF in windows, like that calculate Fst using VCFTOOLS in windows and steps. So it seems unachievable for XPEHH as you said. One more question, i see the norm script which has the function to normlize xpehh values with windows by "--bp-win" parameter, what's the underlying roles for normalizaton? and the columums with "gt the fraction of XP-EHH scores >2" and "lt the fraction of XP-EHH scores < -2" means the frequency of XP-EHH scores >2 and frequency of XP-EHH scores < -2, respectively?

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/111#issuecomment-2105453458, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQXIBGOW4VHB3PFCW53ZBV4BNAVCNFSM6AAAAABHQPHLCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBVGQ2TGNBVHA . You are receiving this because you commented.Message ID: @.***>