pierrebarbera / epa-ng

Massively parallel phylogenetic placement of genetic sequences
GNU Affero General Public License v3.0
77 stars 7 forks source link

Treeparsing error #12

Closed gavinmdouglas closed 6 years ago

gavinmdouglas commented 6 years ago

Hi there,

I'm using the latest version of epa-ng and am getting this error when I try to run it:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Treeparsing failed!

The input tree (edit: which is in newick format) is unrooted and is based on 16,539 sequences. Any idea how I can better troubleshoot what is causing the error?

Thanks,

Gavin

lczech commented 6 years ago

Hi @gavinmdouglas,

good that you made the edit, I was just about to ask whether the file is Newick ;-) Does it have any further annotations in it, like support values and the like? If yes, try removing them. If not, please can you send us the file?

Lucas

pierrebarbera commented 6 years ago

Hi Gavin,

I just pushed a quick fix to the master branch improving the error message. Please do a pull, recompile and rerun!

Pierre

gavinmdouglas commented 6 years ago

Thanks @lczech and @Pbdas for your rapid responses and quick push! My tree did not have support values so that wasn't a problem here. The issue was that my tree was non-binary, i.e. at least one node had more than 2 children. This was the error I got after your change to the error reporting:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Treeparsing failed! syntax error, unexpected ',', expecting ')'. (line 1 column 8124-8125)

I'm not sure how pervasive this problem was, but a crude fix was to randomly separate polytomic lineages with the ape R package and it works now!

I ran into a different problem after this fix - I didn't realize that the query sequences needed to be aligned with the reference sequences. Is there a tool you would recommend for aligning the query sequences to an existing multiple sequence alignment? I ended up making a new MSA of all the sequences and then pruned out the query and reference sequences separately, but this isn't ideal since it resulted in a slightly different MSA of the reference sequences than what I used to create the reference tree.

Specifying that the query sequences need to be in MSA format might be helpful here on the README: https://github.com/Pbdas/epa-ng#usage (it is mentioned elsewhere I realized later).

Thanks for your help!

lczech commented 6 years ago

That was quickly solved. By the way, for the future: We prefer discussing problems with input files on our Google Group. But as this started as an issue with the program itself, let's continue here ;-) (I hope @Pbdas agrees)

What program did you use to infer your tree? Usually, I'd expect that program to produce bifurcating files, otherwise it does not make much sense to conduct phylogenetic placement. Also, for best results, it should be a maximum likelihood tree, because epa-ng is also likelihood-based.

As for aligning query sequences, we use PaPaRa. It uses the tree and the reference alignment to get a phylogenetically informed alignment for each query sequence. There is also a tool called hmmalign, which is part of the HMMER suite, I think. Never have used that however.

gavinmdouglas commented 6 years ago

Ok good to know about the google group. I used FastTree to generate the tree: I have been testing RAxML as well, but so far the phylogenies haven't been very different. I haven't heard of PaPaRa, it sounds useful. Thanks for your thoughts!