szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
107 stars 33 forks source link

Binomiol distribution of ihs values #108

Open DaveLutgen opened 5 months ago

DaveLutgen commented 5 months ago

Hello,

We have been computing ihs along the genome for a group of birds that hybridize. The populations here were chosen to contain 10 individuals each with non-hybrid individuals. An astonishing pattern that we struggle to explain (see attachement) is a bimodal distribution of absolute ihs values across long stretches for the 3rd, 5th, 8th and 9th population. These long stretches are replicated across different chromosomes and don't always involve the same populations.

Would you have any insights on what could produce such bimodality in ihs values?

Huge thanks in advance! All the best ihs_combined_plot_chr20

szpiech commented 5 months ago

Hello,

Could you give me some more information about your data and the command line args you used?

Zachary

Le ven. 2 févr. 2024 à 9:41 AM, Dave Lutgen @.***> a écrit :

Hello,

We have been computing ihs along the genome for a group of birds that hybridize. The populations here were chosen to contain 10 individuals each with non-hybrid individuals. An astonishing pattern that we struggle to explain (see attachement) is a bimodal distribution of absolute ihs values across long stretches for the 3rd, 5th, 8th and 9th population. These long stretches are replicated across different chromosomes and don't always involve the same populations.

Would you have any insights on what could produce such bimodality in ihs values?

Huge thanks in advance! All the best ihs_combined_plot_chr20.png (view on web) https://github.com/szpiech/selscan/assets/53175344/2a7c8dd6-b26f-44a8-b2d8-586a5d3fc1fe

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/108, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQSHE4CY3KJMCLEUFM3YRT3H5AVCNFSM6AAAAABCWZOPUKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTKMBZGM2TOMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

DaveLutgen commented 5 months ago

Hello Zachary,

Thanks for your quick reply. The data used is whole-genome resequencing data with an average coverage above 15X. The data is based on 10x linked-read technology, which after genotyping with GATK was used to obtain physically phase before using Shapeit4 for statistical phasing. We are talking about a divergence scale younger than a 1 million years with high degrees of incomplete lineage sorting. Hybridization is abundant across populations, even though the chose individuals do not contain any hybrids.

$HOME/selscan/bin/linux/selscan --ihs --vcf $vcf --out /cluster/scratch/dlutgen/selscan/stats/ihs/$chr"_rmap_"$pop --map $map --threads 

I'm happy to give more information of needed?

All the best, Dave

DaveLutgen commented 5 months ago

Hello Zachary,

I'm forwarding this response here, because I'm not sure my github message went through.

Thanks for your quick reply. The data used is whole-genome resequencing data with an average coverage above 15X. The data is based on 10x linked-read technology, which after genotyping with GATK was used to obtain physically phase before using Shapeit4 for statistical phasing. We are talking about a divergence scale younger than a 1 million years with high degrees of incomplete lineage sorting. Hybridization is abundant across populations, even though the chose individuals do not contain any hybrids.

$HOME/selscan/bin/linux/selscan --ihs --vcf $vcf --out /cluster/scratch/dlutgen/selscan/stats/ihs/$chr"rmap"$pop --map $map --threads

I'm happy to give more information of needed?

All the best, Dave


De : Zachary A Szpiech @.> Envoyé : lundi 5 février 2024 21:40 À : szpiech/selscan @.> Cc : Dave Lutgen @.>; Author @.> Objet : Re: [szpiech/selscan] Binomiol distribution of ihs values (Issue #108)

Hello,

Could you give me some more information about your data and the command line args you used?

Zachary

Le ven. 2 févr. 2024 à 9:41 AM, Dave Lutgen @.***> a écrit :

Hello,

We have been computing ihs along the genome for a group of birds that hybridize. The populations here were chosen to contain 10 individuals each with non-hybrid individuals. An astonishing pattern that we struggle to explain (see attachement) is a bimodal distribution of absolute ihs values across long stretches for the 3rd, 5th, 8th and 9th population. These long stretches are replicated across different chromosomes and don't always involve the same populations.

Would you have any insights on what could produce such bimodality in ihs values?

Huge thanks in advance! All the best ihs_combined_plot_chr20.png (view on web) https://github.com/szpiech/selscan/assets/53175344/2a7c8dd6-b26f-44a8-b2d8-586a5d3fc1fe

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/108, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQSHE4CY3KJMCLEUFM3YRT3H5AVCNFSM6AAAAABCWZOPUKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTKMBZGM2TOMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/szpiech/selscan/issues/108#issuecomment-1928057526, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMVWIMB5ERCPBBW4IL3TW3LYSE7VFAVCNFSM6AAAAABCWZOPUKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRYGA2TONJSGY. You are receiving this because you authored the thread.Message ID: @.***>

szpiech commented 5 months ago

Hi Dave,

Thanks for the reminder, and sorry for the delay. So, these definitely some curious patterns. Under neutrality, we would expect the distribution of normalized ihs scores to be roughly distributed N(0,1), but when there is selection in the genome, we might expect to see multi-modality, with the selected regions having high absolute scores.

Can I first ask whether you normalized these iHS scores in frequency bins, e.g. by using the norm program included with selscan? My guess is yes, based on the values on the y-axis, but I'd like to confirm.

With smaller sample sizes, we have to be a little careful, as these statistics start to lose some power and the results can be a bit noisier. It looks like you have a cluster of very extreme values spanning 5 Mbps in some of these plots, the strongest/best characterized signal i've seen with iHS is in the human LCT region in northern europeans, where we see an enrichment spanning ~3Mbps. So in principle this could be evidence for selection in these regions.

However, it would be worth trying to exclude some other possibilities, mainly odd sequencing artifacts. How is the mappability in these regions? If you have poor mappability, I might be less inclined to believe these patterns.

In general, though, bimodality of iHS is typically evidence that you have some recent selection in your genomes.

Zachary

On Tue, Feb 13, 2024 at 8:40 AM Dave Lutgen @.***> wrote:

Hello Zachary,

I'm forwarding this response here, because I'm not sure my github message went through.

Thanks for your quick reply. The data used is whole-genome resequencing data with an average coverage above 15X. The data is based on 10x linked-read technology, which after genotyping with GATK was used to obtain physically phase before using Shapeit4 for statistical phasing. We are talking about a divergence scale younger than a 1 million years with high degrees of incomplete lineage sorting. Hybridization is abundant across populations, even though the chose individuals do not contain any hybrids.

$HOME/selscan/bin/linux/selscan --ihs --vcf $vcf --out /cluster/scratch/dlutgen/selscan/stats/ihs/$chr"rmap"$pop --map $map --threads

I'm happy to give more information of needed?

All the best, Dave


De : Zachary A Szpiech @.> Envoyé : lundi 5 février 2024 21:40 À : szpiech/selscan @.> Cc : Dave Lutgen @.>; Author @.> Objet : Re: [szpiech/selscan] Binomiol distribution of ihs values (Issue

108)

Hello,

Could you give me some more information about your data and the command line args you used?

Zachary

Le ven. 2 févr. 2024 à 9:41 AM, Dave Lutgen @.***> a écrit :

Hello,

We have been computing ihs along the genome for a group of birds that hybridize. The populations here were chosen to contain 10 individuals each with non-hybrid individuals. An astonishing pattern that we struggle to explain (see attachement) is a bimodal distribution of absolute ihs values across long stretches for the 3rd, 5th, 8th and 9th population. These long stretches are replicated across different chromosomes and don't always involve the same populations.

Would you have any insights on what could produce such bimodality in ihs values?

Huge thanks in advance! All the best ihs_combined_plot_chr20.png (view on web) < https://github.com/szpiech/selscan/assets/53175344/2a7c8dd6-b26f-44a8-b2d8-586a5d3fc1fe>

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/108, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABAKRQSHE4CY3KJMCLEUFM3YRT3H5AVCNFSM6AAAAABCWZOPUKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTKMBZGM2TOMI>

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub< https://github.com/szpiech/selscan/issues/108#issuecomment-1928057526>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AMVWIMB5ERCPBBW4IL3TW3LYSE7VFAVCNFSM6AAAAABCWZOPUKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRYGA2TONJSGY>.

You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/108#issuecomment-1941545793, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQURY7A2X2TXYMKSPF3YTNUONAVCNFSM6AAAAABCWZOPUKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBRGU2DKNZZGM . You are receiving this because you commented.Message ID: @.***>