neherlab / treetime

Maximum likelihood inference of time stamped phylogenies and ancestral reconstruction
MIT License
223 stars 55 forks source link

Unable to import tree using treeAnc #73

Closed pilpel-lab closed 5 years ago

pilpel-lab commented 5 years ago

Hey, We're Trying to import a tree using treeAnc. We've encountered several issues:

  1. The module was unable to parse a tree formatted as nwk, generated by python's ete3.NCBITaxa.get_topology . This was solved by downloading a phylip-formatted tree from NCBI. We do not deal with phylogenetic trees regularly, so maybe there's an issue with ete3 module, so we're just raising a flag. The nwk-formatted tree is viewable using mega-x.
  2. treeAnc module was unable to parse the alignment file, a FASTA file downloaded from orthoDB and aligned using Clustal Omega.
  3. In any case, no error was raised, so we needed to inspect the code and try to figure out what went wrong...

We'd be happy for advice regarding the alignment file. All files are attached. treeTime_files.zip

Thanks, Omer and Alisa, Pilpel lab

rneher commented 5 years ago

thanks for reaching out. regarding your issues:

running treetime ancestral --aln aln.fasta --tree phyliptree.phy parses both alignment and tree ok, but the taxon names don't match sequence names in the alignment:

\:> treetime ancestral --aln aln.fasta --tree phyliptree.phy 

0.00    -TreeAnc: set-up

0.04    ***WARNING: TreeAnc._attach_sequences_to_nodes: NO SEQUENCE FOR LEAF: Oryctolagus cuniculus

0.04    ***WARNING: TreeAnc._attach_sequences_to_nodes: NO SEQUENCE FOR LEAF: Ochotona princeps

0.04    ***WARNING: TreeAnc._attach_sequences_to_nodes: NO SEQUENCE FOR LEAF: Nannospalax galili

0.04    ***WARNING: TreeAnc._attach_sequences_to_nodes: NO SEQUENCE FOR LEAF: Fukomys damarensis

0.04    ***WARNING: TreeAnc._attach_sequences_to_nodes: NO SEQUENCE FOR LEAF: Jaculus jaculus

0.04    ERROR: At least 30\% terminal nodes cannot be assigned with a sequence!

The same problem is encountered when using new_tree.nwk:

\:> treetime ancestral --aln aln.fasta --tree new_tree.nwk 

0.00    -TreeAnc: set-up

0.04    ***WARNING: TreeAnc._attach_sequences_to_nodes: NO SEQUENCE FOR LEAF: 9986

0.04    ***WARNING: TreeAnc._attach_sequences_to_nodes: NO SEQUENCE FOR LEAF: 9978

0.04    ***WARNING: TreeAnc._attach_sequences_to_nodes: NO SEQUENCE FOR LEAF: 10020

0.04    ***WARNING: TreeAnc._attach_sequences_to_nodes: NO SEQUENCE FOR LEAF: 43179

0.04    ***WARNING: TreeAnc._attach_sequences_to_nodes: NO SEQUENCE FOR LEAF: 885580

0.05    ERROR: At least 30\% terminal nodes cannot be assigned with a sequence!

here, the issue is that your sequence names are of the form >10181_0:002977 where the taxon names are only 10181_0.

You need to make sure taxon names match sequence names (colons are not admissible in nwk anyway). Your phylip alignment is not correctly parsed by biopython:

In [1]: from Bio import AlignIO

In [2]: AlignIO.read('aln.phylip', 'phylip-relaxed')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-211a42c6215e> in <module>()
----> 1 AlignIO.read('aln.phylip', 'phylip-relaxed')
pilpel-lab commented 5 years ago

Thanks for the quick reply, it worked great!