niemasd / TreeSwift

TreeSwift: Fast tree module for Python 3
https://niema.net/TreeSwift
GNU General Public License v3.0
75 stars 14 forks source link

Parsing Error - Unescaped Characters #27

Closed valevalorin closed 1 year ago

valevalorin commented 1 year ago

I have a tree containing labels with both ';' and ':' characters. I can write the tree to the newick format without issue, but if I try to parse the string it fails when it encounters the unescaped ';' and ';' characters. I don't think there is a standard way of escaping for the format, but this module uses ' as the escape characters.

niemasd commented 1 year ago

Can you provide an example file (ideally as small as possible) that I could use to test any potential fixes?

valevalorin commented 1 year ago

String: (https://youtube.com,&)root || ('https://youtube.com','&')root || ("https://youtube.com","&")root

Code:

from treeswift import Node, read_tree_newick

root = Node('root')
url = Node('https://youtube.com')
amp = Node('&')
root.add_child(url)
root.add_child(amp)

t = read_tree_newick(root.newick())

Edit: Specific Error

ValueError                                Traceback (most recent call last)
~/anaconda3/envs/tensorflow2_p38/lib/python3.8/site-packages/treeswift/Tree.py in read_tree_newick(newick)
   1444                     ls += ts[i]; i += 1
-> 1445                 n.edge_length = float(ls); i -= 1; parse_length = False
   1446 

ValueError: could not convert string to float: '//youtube.com'
niemasd commented 1 year ago

Thanks for sharing! I don't think your original string is a valid Newick string, though: even with reserved characters being handled via escaping, Newick strings should end with a semicolon. The TreeSwift.Node.newick function is not meant to be called directly and is instead meant to be called recursively to generate the partial Newick string for a given subtree (so that TreeSwift.Tree.newick can just call root.newick() and then add a semicolon and some other stuff). Here's a slightly updated code example:

from treeswift import Tree, Node
tree = Tree()
url = Node('https://youtube.com')
amp = Node('&')
tree.root.add_child(url)
tree.root.add_child(amp)
print(tree.newick())

Which outputs a Newick string that is "valid" if reserved symbols are allowed to be used if they're escaped:

[&R] (https://youtube.com,&);

This Newick string is indeed currently unsupported:

from treeswift import read_tree_newick
read_tree_newick("[&R] (https://youtube.com,&);")
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/treeswift/Tree.py", line 1445, in read_tree_newick
    n.edge_length = float(ls); i -= 1; parse_length = False
ValueError: could not convert string to float: '//youtube.com'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/treeswift/Tree.py", line 1455, in read_tree_newick
    raise RuntimeError("Failed to parse string as Newick: %s"%ts)
RuntimeError: Failed to parse string as Newick: (https://youtube.com,&AMP;);

I'll try to add support for escaped reserved symbols as soon as possible

valevalorin commented 1 year ago

Your code sample is really interesting to me because it does something I couldn't find documentation of - how to make a tree manually (i.e. without reading it from a newick string). The process wasn't obvious, and I couldn't find an example, thus why I'm only creating nodes and getting the newick from there.

I just double checked the Cookbook and I don't see an example like the above that shows the manual process. It might be a good idea to add it to the Cookbook as well.

niemasd commented 1 year ago

Regarding the code example, great idea! I've just added such a code sample to the Cookbook, under the following:

"TreeSwift also supports manually creating a tree from scratch:"

Regarding the initial issue, I just pushed a new release of TreeSwift (v1.1.31) which should fix the issue. I only support ' (not ") to escape reserved symbols just as in the link you posted, but if the standard ever changes to support both, I'll update accordingly. You can update your version of TreeSwift as follows (if you use pip):

pip install --upgrade treeswift

It may take a couple of attempts for it to register that there's a new version

valevalorin commented 1 year ago

Thank you so much!