niemasd / TreeSwift

TreeSwift: Fast tree module for Python 3
https://niema.net/TreeSwift
GNU General Public License v3.0
75 stars 14 forks source link

Failed reading nexus file #9

Closed pekarj closed 5 years ago

pekarj commented 5 years ago

I tried reading a Nexus file after creating a tree using BEAST and TreeAnnotator. I'm running python 3.6.8. I've attached the Nexus file, and the error message is below.

My commands are

filename = "./cluster7.cluster16msd.beast.MCC.tre.txt"
tree = treeswift.read_tree_nexus(filename)

If I use the Bio Phylo package, I am able to read the tree:

from Bio import Phylo
tree = Phylo.read(filename, "nexus")

The error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/.local/lib/python3.6/site-packages/treeswift/Tree.py in read_tree_newick(newick)
   1260             elif ts[i] == ',':
-> 1261                 n = n.parent; c = Node(); n.add_child(c); n = c
   1262             elif ts[i] == ':':

AttributeError: 'NoneType' object has no attribute 'add_child'

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-112-cf92dde72745> in <module>()
      1 filename = "./cluster7.cluster16msd.beast.MCC.tre"
----> 2 tree = treeswift.read_tree_nexus(filename)

~/.local/lib/python3.6/site-packages/treeswift/Tree.py in read_tree_nexus(nexus)
   1418             i = l.index('='); left = l[:i].strip(); right = l[i+1:].strip()
   1419             name = ' '.join(left.split(' ')[1:])
-> 1420             trees[name] = read_tree_newick(right)
   1421     if hasattr(f,'close'):
   1422         f.close()

~/.local/lib/python3.6/site-packages/treeswift/Tree.py in read_tree_newick(newick)
   1272             i += 1
   1273     except Exception as e:
-> 1274         raise RuntimeError("Failed to parse string as Newick: %s"%ts)
   1275     return t
   1276 

RuntimeError: Failed to parse string as Newick: ((2[&length_range={0.21597029524473332,28.547570881223923},height_95%_HPD={0.7671232876714384,0.7671232876714384},rate_95%_HPD={5.64190040109167E-4,0.0014707264379822502},length_95%_HPD={0.5455547542308703,5.622527436842115},rate=0.0010265516211584722,length=2.6535668186631356,rate_median=0.0010273081640561326,length_median=2.293838336244244,height_median=0.7671232876714384,rate_range={1.654447417087706E-4,0.002031565538373628},height_range={0.7671232876714367,0.7671232876714402},height=0.7671232876712545]:2.178557085744412,5[&length_range={0.21352458278912734,14.858858823100011},height_95%_HPD={0.4575342465755057,0.4575342465755057},rate_95%_HPD={5.64190040109167E-4,0.0014707264379822502},length_95%_HPD={0.4518083370189887,4.300775420881591},rate=0.0010265516211584722,length=2.2832812455529736,rate_median=0.0010273081640561326,length_median=2.0907047484924317,height_median=0.4575342465755057,rate_range={1.654447417087706E-4,0.002031565538373628},height_range={0.4575342465755039,0.45753424657550923},height=0.4575342465755082]:2.488146126840345)[&length_range={3.46136141267106E-4,11.0863645107428},rate_95%_HPD={5.778044583586271E-4,0.0014875016069454366},length_95%_HPD={3.46136141267106E-4,2.0279025377890725},length=0.8517693758060189,posterior=0.5726030441062104,height_median=2.4276711927194197,rate_range={1.654447417087706E-4,0.002031565538373628},height_range={0.983093582916172,11.57319222116077},height_95%_HPD={1.2454634607259805,4.422856638550174},rate=0.001054007132213332,rate_median=0.0010551713645384484,length_median=0.7111555813700576,height=2.629323489357251]:0.41418579746527273,(((6[&length_range={0.05059658443516901,14.02051762989706},height_95%_HPD={1.295875439778456,1.295875439778456},rate_95%_HPD={5.64190040109167E-4,0.0014707264379822502},length_95%_HPD={0.18465266083011134,3.0885263223362722},rate=0.0010265516211584722,length=1.4119610043729642,rate_median=0.0010273081640561326,length_median=1.2202500570227302,height_median=1.295875439778456,rate_range={1.654447417087706E-4,0.002031565538373628},height_range={1.2958754397784542,1.2958754397784578},height=1.295875439778642]:0.7453840407930947,(3[&length_range={7.015232109476299E-4,16.842924065559952},height_95%_HPD={0.6520547945206085,0.652054794520609},rate_95%_HPD={5.64190040109167E-4,0.0014707264379822502},length_95%_HPD={7.015232109476299E-4,2.6496594360990997},rate=0.0010265516211584722,length=1.1340483187927906,rate_median=0.0010273081640561326,length_median=0.9689487218649604,height_median=0.652054794520609,rate_range={1.654447417087706E-4,0.002031565538373628},height_range={0.6520547945206054,0.6520547945206125},height=0.6520547945204879]:0.8024497976169296,4[&length_range={0.019879605402676748,9.207455791493175},height_95%_HPD={0.6328767123288794,0.6328767123288799},rate_95%_HPD={5.64190040109167E-4,0.0014707264379822502},length_95%_HPD={0.02147494205424172,3.0490046756203073},rate=0.0010265516211584722,length=1.2846266135261182,rate_median=0.0010273081640561326,length_median=1.1280194622013584,height_median=0.6328767123288799,rate_range={1.654447417087706E-4,0.002031565538373628},height_range={0.6328767123288763,0.6328767123288834},height=0.632876712328742]:0.8216278798086587)[&length_range={2.082250224288984E-5,13.309313133511994},rate_95%_HPD={6.05669757203006E-4,0.00148702055207987},length_95%_HPD={2.6687278881620813E-4,2.604127287263421},length=1.1885749818096911,posterior=0.6064881679813354,height_median=1.253481293233821,rate_range={2.068762742038897E-4,0.002031565538373628},height_range={0.6527563177315567,8.10836812842163},height_95%_HPD={0.6527563177315567,2.5270244460129914},rate=0.0010449944130555696,rate_median=0.0010433415169864044,length_median=1.092495311350225,height=1.400894455716659]:0.5867548884340121)[&length_range={8.30257280096447E-4,8.376269570332376},rate_95%_HPD={5.732602752583437E-4,0.0014592631114144791},length_95%_HPD={0.0013206790659354706,1.8258125021704914},length=0.7230820307321237,posterior=0.43628485723808463,height_median=2.2743261039569753,rate_range={2.9277912019346316E-4,0.002031565538373628},height_range={1.346472024213625,12.153818106505899},height_95%_HPD={1.4855544693988638,3.6488813350750258},rate=0.0010456218910541131,rate_median=0.00104438397258734,length_median=0.5858829224328863,height=2.4227398099436286]:0.6482904476597948,7[&length_range={4.140998817581121E-5,14.9341159070317},height_95%_HPD={2.2384983905981244,2.2384983905981244},rate_95%_HPD={5.64190040109167E-4,0.0014707264379822502},length_95%_HPD={4.140998817581121E-5,1.7740950934847994},rate=0.0010265516211584722,length=0.5809498086967516,rate_median=0.0010273081640561326,length_median=0.38515716031797886,height_median=2.2384983905981244,rate_range={1.654447417087706E-4,0.002031565538373628},height_range={2.2384983905981226,2.238498390598126},height=2.2384983905977642]:0.45105153763322114)[&length_range={5.489799384430505E-4,6.588682011244441},rate_95%_HPD={5.558262618700811E-4,0.0014191752597354794},length_95%_HPD={5.489799384430505E-4,1.7079591100232165},length=0.5551989173223386,posterior=0.18486834796133764,height_median=2.8067614028230654,rate_range={3.1106396858220585E-4,0.002031565538373628},height_range={2.238979593885517,7.233960321866932},height_95%_HPD={2.238979593885517,4.252782100530326},rate=0.0010009970036199316,rate_median=0.001004114275137352,length_median=0.3648226628443463,height=2.9862324314594133]:0.1027684175964958,1[&length_range={0.6795085286265112,21.953876464012417},rate_95%_HPD={5.64190040109167E-4,0.0014707264379822502},length_95%_HPD={1.4798333540272428,6.141469161296121},rate=0.0010265516211584722,length=3.4199462564698933,rate_median=0.0010273081640561326,length_median=3.1017295487544727,rate_range={1.654447417087706E-4,0.002031565538373628},height=0.0]:2.7923183458278413)[&length_range={2.310267208915917E-4,9.334624664769258},rate_95%_HPD={5.70324022126953E-4,0.0014452103173617234},length_95%_HPD={2.310267208915917E-4,1.9079686360179107},length=0.7476658493291912,posterior=0.3019664481724253,height_median=3.1128264204132643,rate_range={1.654447417087706E-4,0.002031565538373628},height_range={2.239259658823974,17.49497886008056},height_95%_HPD={2.239259658823974,4.883215181877689},rate=0.0010132172099033838,rate_median=0.0010162752500032257,length_median=0.5929916549627732,height=3.312026615454639]:0.5675478250532819)[&height_95%_HPD={2.238801736476874,6.8416346100033945},length=0.0,posterior=1.0,height_median=4.068031879497686,height_range={2.238801736476874,29.31469416889536},height=4.341552801214587];

cluster7.cluster16msd.beast.MCC.tre.txt

niemasd commented 5 years ago

The bug should now be fixed in TreeSwift version 1.1.1

pekarj commented 5 years ago

Generally works, but still has an issue with some files. For instance, the file attached below. However, upon removing all instances of a rate parameter in the nexus file, treeswift read it fine (Bio Phylo could still read the file with the parameter included, but I don't know if you want similar functionality in that aspect, since many tree files won't have this). The rate parameter looks like the following in my file:

[&rate=6.684064287829595E-4]

I've attached the original file and the working edited file, and the only difference is if the file includes the rate parameter (I believe at every branch).

cluster2.cluster16msd.beast.MCC.edited.tre.txt cluster2.cluster16msd.beast.MCC.tre.txt

niemasd commented 5 years ago

Okay, I think I've fixed this new issue in TreeSwift version 1.1.2. Parameters like this that are prepended to edge lengths are now (hopefully) properly parsed and stored in a node's edge_params variable