Phased data - Githubissues

pblischak / HyDe

Hybridization detection using phylogenetic invariants

http://hybridization-detection.readthedocs.io

MIT License

41 stars 14 forks source link

Phased data #13

Open kyleaoconnell22 opened 4 years ago

kyleaoconnell22 commented 4 years ago

Hello Paul, Can the input alignment be phased, or should it be the consensus with one line per individual? Thanks, Kyle

pblischak commented 4 years ago

Hi Kyle -- I think that using a phased alignment should be fine. Let me know if there are any issues though

kyleaoconnell22 commented 4 years ago

Thanks Paul, I will let you know what if anything looks weird.

Kyle

On Sat, Jan 11, 2020 at 10:47 AM Paul Blischak notifications@github.com wrote:

Hi Kyle -- I think that using a phased alignment should be fine. Let me know if there are any issues though

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pblischak/HyDe/issues/13?email_source=notifications&email_token=AFN6L33BN23DG3A5ZU2NELLQ5HS2XA5CNFSM4KFJGJD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIWEWAY#issuecomment-573328131, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFN6L37KUJXI4CFSQRRLSRLQ5HS2XANCNFSM4KFJGJDQ .

Kyle O'Connell Postdoctoral Fellow The George Washington University Department of Biology

burbrink commented 3 years ago

Following up on Kyle's question, I see that even with phased data it looks like you are still requiring the number of individuals is double what they actually are when running run_hyde.py. So, for instance, I have 144 individuals phased for 288 sequences. Of course this will not run with -n 144. Is there a way for HyDe to understand that there are two sequences per individual to leverage that phased info?

Thank you in advance!

Frank

pblischak commented 3 years ago

Ah, I think I may not have completely understood Kyle's original question -- sorry about that! I think it would be more correct for the -n option to be for "number of chromosomes" from the population, rather than number of individuals. However, I'm imagining that if you want to run HyDe at the individual level and you have phased data you would want the analysis to be done for the individual and not separately for its chromosomes. For the population-level run_hyde.py analysis I think you should be OK with just saying that each phased sequence is an "individual" because the model assumes that chromosomes are exchangeable anyway

That being said, do you think it would be useful to be able to associate phased data with individuals? I've been slowly tinkering away at a v1.0 release of HyDe and could try to add this if I get the chance

burbrink commented 3 years ago

I haven't thought out how HyDe would handle that computational and translate that into an estimate of the amount of admixture, but typically in other software (e.g., Structure,SMNF, TESS3r etc.) they retain the phased SNPs within individuals to understand admixture, or in some cases F1 or F2 status (Newhybrids). HyDe is clearly working differently, so I don't know if something like this would be a big overhaul for you, but it seems like the phased info should be an advantage for really understand the nature of hybridization in individuals.