pblischak / HyDe

Hybridization detection using phylogenetic invariants
http://hybridization-detection.readthedocs.io
GNU General Public License v3.0
41 stars 15 forks source link

Gamma values > 1.0 and negative Z scores #17

Open burbrink opened 3 years ago

burbrink commented 3 years ago

Hello,

I have concatenated a Rad dataset for 100 individuals between two populations with an outgroup for >200,000 bps (not just snps). I placed all individuals with ancestral coefficients from ADMIXTURE between 0.15-0.85 into a category labeled hybrids. These hybrids occur at the geographic interface between two populations and do likely represent some hybrids or introgressed individuals.

When I ran dat.test_triple("p1", "hybrids", "p2") It does not detect the population as hybrids: Zscore': 1.1033779680260525, 'Pvalue': 0.13493157539518108, 'Gamma': 0.3994434282279553

Similarly when I run dat.test_individuals it finds no significant hybrids. Some of these are likely hybrids. But some of the results yield gammas close to 0.5 and are not significant. Others have yielded negative z scores with gammas >1.0. Here are some examples: Zscore': 0.633475935023594, 'Pvalue': 0.26321138714970815, 'Gamma': 0.3780456480258329 'Zscore': -0.29460033809760533, 'Pvalue': 0.6158503303969829, 'Gamma': 1.1279513034923758

I realize that the Z scores are not high enough to be significant, but I am wondering how the gammas could be this high and not significant.

I read the new paper by Kong and Kubatko comparing performance of HyDe with structure based methods, but it would be hard to believe that so many instances of admixture are not detected with HyDe. Might this to do with missing data, which here could be upwards of 30% for some individuals.

Thanks so much for taking the time to look this over and answer me.

Frank

--

pblischak commented 3 years ago

Hi Frank,

So I think in this case the number of SNPs compared to total base pairs is important -- constant sites don't go into the calculation of the test statistic for detecting hybridization. That's why we tell folks to just combine everything since HyDe takes care of the rest. From the testing I've done, HyDe's power to detect hybridization is usually pretty bad if you have less than ~10k SNPs. You can still get an estimate of gamma that makes sense, but the test statistic will just be too small to be significant.

One thing you might be able to do would be to run the individual bootstrapping method with the hybrid population. I would imagine that if these individuals are actually admixed, they will have estimates of gamma that are always greater than zero. So you might be able to confirm they are hybrids using a bootstrapped p-value rather than using the test statistic returned by HyDe.

Hopefully that helps at least a little bit! Feel free to let me know if anything comes up though

burbrink commented 3 years ago

Hi Paul,

Thanks for the email. Ok, I will try the bootstrapping procedure then. I only have 4K snps here, so I’ll take it all with a grain of salt.

Thanks again.

Frank


Frank T. Burbrink, Ph.D. Chair, Division of Vertebrate Zoology Curator-in-Charge Department of Herpetology American Museum of Natural History Central Park West at 79th Street New York, NY 10024-5192

Website: https://sites.google.com/view/frank-burbrink-website/

Professor, Richard Gilder Graduate School, AMNH Adjunct Professor, Department of Ecology, Evolution and Environmental Biology, Columbia University, New York Adjunct Professor of Biology, City University of New York, New York

On Feb 15, 2021, at 7:53 AM, Paul Blischak notifications@github.com wrote:

 Hi Frank,

So I think in this case the number of SNPs compared to total base pairs is important -- constant sites don't go into the calculation of the test statistic for detecting hybridization. That's why we tell folks to just combine everything since HyDe takes care of the rest. From the testing I've done, HyDe's power to detect hybridization is usually pretty bad if you have less than ~10k SNPs. You can still get an estimate of gamma that makes sense, but the test statistic will just be too small to be significant.

One thing you might be able to do would be to run the individual bootstrapping method with the hybrid population. I would imagine that if these individuals are actually admixed, they will have estimates of gamma that are always greater than zero. So you might be able to confirm they are hybrids using a bootstrapped p-value rather than using the test statistic returned by HyDe.

Hopefully that helps at least a little bit! Feel free to let me know if anything comes up though

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.