pierrebarbera / epa-ng

Massively parallel phylogenetic placement of genetic sequences
GNU Affero General Public License v3.0
77 stars 7 forks source link

FastTree #47

Open atoselan opened 1 year ago

atoselan commented 1 year ago

Hi, I'm just setting up a pipeline to place query sequences onto a phylogentic tree of ~5,000 reference sequences. I'm finding Raxml to be very slow for tree building and wondered whether epa-ng supports the use of FastTree. Will this be a problem when I have to supply the model parameters to epa-ng? I've looked online but haven't found any examples of this.

Cheers, Andrew

lczech commented 1 year ago

Hey @atoselan,

as far as I am aware, as long as you get a newick tree and its model parameters in some form that EPA-ng understands, that should work. See here for the specifications of model params that EPA-ng expects.

I don't know in which format FastTree outputs its parameters. If they are not in that format, once you have the newick file from FastTree, you can use RAxML-ng to obtain the model parameters for it, which will not run the whole tree search, but only give you these params, as explained in the above link as well.

Hope that helps Lucas

stamatak commented 1 year ago

Maybe, for a 5000 sequences reference tree you should also take the tree inference uncertainty into account, see for instance here:

https://academic.oup.com/mbe/article/38/5/1777/6030946

and also our new tool for predicting the difficulty of a phylogenetic analysis:

https://academic.oup.com/mbe/article/39/12/msac254/6832260

Alexis

On 21.03.23 19:27, Lucas Czech wrote:

Hey @atoselan https://github.com/atoselan,

as far as I am aware, as long as you get a newick tree and its model parameters in some form that EPA-ng understands, that should work. See here https://github.com/pierrebarbera/epa-ng#setting-the-model-parameters for the specifications of model params that EPA-ng expects.

I don't know in which format FastTree outputs its parameters. If they are not in that format, once you have the newick file from FastTree, you can use RAxML-ng to obtain the model parameters for it, which will not run the whole tree search, but only give you these params, as explained in the above link as well.

Hope that helps Lucas

— Reply to this email directly, view it on GitHub https://github.com/pierrebarbera/epa-ng/issues/47#issuecomment-1478394341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6RJKEGOAK7HQDFVFMLW5HXKZANCNFSM6AAAAAAWCU7EMI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Alexandros (Alexis) Stamatakis

ERA Chair, Institute of Computer Science, Foundation for Research and Technology - Hellas Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.biocomp.gr (Crete lab) www.exelixis-lab.org (Heidelberg lab)

atoselan commented 1 year ago

Many thanks, it turned out to be straight-forward to get a fasttree info file using raxml. I know this is a different issue but I can't get papara to work, I get a vague error message about inconsistency in the alignment which I can't resolve. I've started looking at using hmmer/hmmalign instead and wondered if you had any examples to follow for using this approach. Hmmer and hmmalign work fine but how to I ensure that I have alignments of the same length? I've created a hmm from the reference alignment, aligned the queries to the hmm but now the alignments are of different lengths.

lczech commented 1 year ago

Hm, if I recall correctly, hmmer/hmmalign uses a flag -m to keep the length. I'd check their manual :-)