Closed 000generic closed 3 years ago
Eric,
I would still not filter these gene trees. We have not seen species tree topologies improving after removing genes simply because the gene does not have enough species. The ranges you described do not seem extreme to me (assuming these are two different datasets and not one dataset).
Regards Siavash
On Fri, Jun 5, 2020 at 6:04 PM Eric Edsinger notifications@github.com wrote:
I have BUSCO Metazoa and BUSCO Mollusca gene sets for ~50 cephalopod species. The number of genes per species for most species is around 700 and 3500, respectively - but ranges from 550 - 950 and 2250 - 4500 across species.
In reading your documentation and papers, I understand it is generally better not to filter out genes that are missing some species - but I'm I was wondering if that holds true even at this scale - as some species will only have around half of the genes used to build the species tree. I'm unsure if this would be an extreme case - or the norm - for working with ASTRAL.
Thank you! Eric
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/smirarab/ASTRAL/issues/64, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJXOHP6WQRI2RPYHNZ5KLRVGI2LANCNFSM4NVM3XXA .
-- Siavash Mirarab
Thanks Siavash! I'll leave all species and BUSCO genes in and give it a go. Hope to have the final gene sets processed today and start working my way forward this week.
I was thinking to run per BUSCO gene set across species:
Mafft: BUSCO gene set sequence alignment TrimAI: Alignment trimming Exelixis Lab Perl script: Model selection RAxML: Best ML gene tree building TreeShrink: Long branch gene tree pruning ASTRAL: Species tree building
Are there other steps or software you recommend upstream of ASTRAL? Is it a good idea / common practice to run ASTRAL in parallel with Bayesian trees? Is partitioning for gene tree building commonly done? Finally, coming out of ASTRAL, what is recommended or common practice? Thoughts or guidance on any of this would be great, as I'm new to it all in regards to species evolution.
Thank you!
Hi Eric,
You may find this paper useful:
Siavash Mirarab. “Species Tree Estimation Using ASTRAL: Practical Considerations.” arXiv 1904.03826 (2019) arXiv:1904.03826 https://arxiv.org/pdf/1904.03826.pdf
Your pipeline is good, but I will also suggest removing fragmentary sequences before doing gene trees. You may have meant this when you said TrimAl, but I am not sure.
Bayesian gene trees in our experience have been better than ML. I would suggest just taking the consensus of the distribution (e.g., MRCA) for each gene.
Thanks Siavash
On Mon, Jun 8, 2020 at 9:49 AM Eric Edsinger notifications@github.com wrote:
Thanks Siavash! I'll leave all species and BUSCO genes in and give it a go. Hope to have the final gene sets processed today and start working my way forward this week.
I was thinking to run per BUSCO gene set across species:
Mafft: Alignment TrimAI: Trimming Exelixis Lab Perl script: Model selection RAxML: Best ML gene tree building TreeShrink: Long branch pruning ASTRAL: Species tree building
Are there other steps or software you recommend upstream of ASTRAL? Is it a good idea / common practice to run ASTRAL in parallel with Bayesian trees? What about coming out of ASTRAL, what is recommended or common practice?
Thank you!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/smirarab/ASTRAL/issues/64#issuecomment-640746016, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJXOE2BFVJDVNDQ45GKR3RVUI7ZANCNFSM4NVM3XXA .
-- Siavash Mirarab
Thanks for the link to your paper!
I've done mostly ML gene trees in the past but will give Bayesian trees a try for ASTRAL.
I'lll be sure to remove fragments prior to tree building - thanks for the advice. I'm still learning TrimAI but it trims to produce blocked alignments, like GBlocks, but scales better I think. Scaling is not so critical for ~50 species per gene - but it is for other comparative work using ~100-500 genomes - and I'm hoping to use the same tools.
Thanks again, Eric
I have BUSCO Metazoa and BUSCO Mollusca gene sets for ~50 cephalopod species. The number of genes per species for most species is around 700 and 3500, respectively - but ranges from 550 - 950 and 2250 - 4500 across species.
In reading your documentation and papers, I understand it is generally better not to filter out genes that are missing some species - but I'm I was wondering if that holds true even at this scale - as some species will only have around half of the genes used to build the species tree. I'm unsure if this would be an extreme case - or the norm - for working with ASTRAL.
Thank you! Eric