Open James-S-Santangelo opened 4 weeks ago
Hi James,
So, I think your best bet for (1) is to use the coordinates that define the windows you're interested in, intersect them so they align maximally, and compute inter-group pairwise sequence distance. If it is "low" they might be the same haplotypes sweeping, if it is "high" they may be different. I suppose you could try to define low and high based on some sort of gnome-wide resampling procedure, but you'd have to design it carefully.
For (2), your chi-squared approach seems reasonable to me.
-Zachary
On Wed, Jun 19, 2024 at 4:19 PM James Santangelo @.***> wrote:
Hey Zachary,
I'm hoping you can help me think through an analysis I'm working on. I have two populations, which we can assume are panmictic. I have run cross-population stats (e.g., XP-nSL, Fst) and single population stats (e.g., nSL, iHH12, iHS, saltiLassi), and have found multiple regions with signatures of positive selection in each population based on these stats. Broadly, I'm interested in a first pass characterization and comparison of the sweep architectures in each of these two populations. Here are my questions:
1.
Is there any way to compare the haplotype frequency spectra between the two populations? I have run saltiLassi in 201 SNP windows with a step of 50 and K set to 20. However, since lassip is run on each population independently, my sense is that Haplotype 1 in pop 1 is not the same as haplotype 1 in pop 2, so raw comparisons of the haplotype frequency spectra in a given region between these two populations would be misleading. 2.
In each population, I have estimates of m (i.e., number of sweeping haplotypes) and A (i.e., width of sweeps) for putatively selected regions of the genome. I'm interested in comparing m and A between these populations to make broad statements about differences in hard (m = 1) vs. soft (m > 1) sweeps between these two populations, and similarly comparing the width of sweeps (A) between these populations. I was planning on binning m into m = 1 and m > 1 (as you suggest in the paper) and probably just doing a simple Chi-squared test on the frequency of hard vs. soft sweeps between the two populations. Again, this is just a first pass and is mostly meant to stimulate discussion and suggest avenues for future work. Does this seem reasonable, or is there any reason you can think of that such an approach would misleading or downright wrong?
Thanks in advance!
James
— Reply to this email directly, view it on GitHub https://github.com/szpiech/lassip/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQUSNR5X4MYRKBCPLG3ZIHRTVAVCNFSM6AAAAABJSU3AC2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3DGMBWHE2TOMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hey Zachary,
I'm hoping you can help me think through an analysis I'm working on. I have two populations, which we can assume are panmictic. I have run cross-population stats (e.g., XP-nSL, Fst) and single population stats (e.g., nSL, iHH12, iHS, saltiLassi), and have found multiple regions with signatures of positive selection in each population based on these stats. Broadly, I'm interested in a first pass characterization and comparison of the sweep architectures in each of these two populations. Here are my questions:
Is there any way to compare the haplotype frequency spectra between the two populations? I have run
saltiLassi
in 201 SNP windows with a step of 50 and K set to 20. However, sincelassip
is run on each population independently, my sense is that Haplotype 1 in pop 1 is not the same as haplotype 1 in pop 2, so raw comparisons of the haplotype frequency spectra in a given region between these two populations would be misleading.In each population, I have estimates of m (i.e., number of sweeping haplotypes) and A (i.e., width of sweeps) for putatively selected regions of the genome. I'm interested in comparing m and A between these populations to make broad statements about differences in hard (m = 1) vs. soft (m > 1) sweeps between these two populations, and similarly comparing the width of sweeps (A) between these populations. I was planning on binning m into m = 1 and m > 1 (as you suggest in the paper) and probably just doing a simple Chi-squared test on the frequency of hard vs. soft sweeps between the two populations. Again, this is just a first pass and is mostly meant to stimulate discussion and suggest avenues for future work. Does this seem reasonable, or is there any reason you can think of that such an approach would misleading or downright wrong?
Thanks in advance!
James