jeetsukumaran / DendroPy

A Python library for phylogenetic scripting, simulation, data processing and manipulation.
https://pypi.org/project/DendroPy/.
BSD 3-Clause "New" or "Revised" License
210 stars 61 forks source link

jplace style edge numbering #49

Closed wwood closed 8 years ago

wwood commented 8 years ago

Hi there,

I was looking for a way to parse the jplace style newick-ish trees, described at http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0031009

which involves adding an edge numbering to the format e.g.

t1 = Tree.get(data='((A:.01{0}, B:.02{1})D:.04{3}, C:.05{4}){5};', schema='newick')

print(t1.edges()[0].edge_number)
# => 5

print([  [number,edge.length] for number,edge in enumerate(t1.edge_index)  ])
# => [[0, 0.01], [1, 0.02], [2, 0.04], [3, 0.05], [4, None]]

I've not included any test code or anything because I'm new here, and I wanted to see if I implemented it in a way you approve of first.

Thanks, ben

jeetsukumaran commented 8 years ago

This looks like a good approach.

(1) The only thing is that I would like to ask that you make the parsing optional, i.e.,

elif self.is_parse_jplace_tokens and nexus_tokenizer.current_token == '{':

with, of course, the appropriate setup in the constructor:

self.is_parse_jplace_tokens = kwargs.pop("is_parse_jplace_tokens", None)

I am not to picky about the default. On the one had, it is not a widespread convention to warrant parsing by default. On the other, it does not seem to do much harm to have the parsing on by default, as if the construct is given in the data, it is unlikely that the user wanted anything else. If I had to pick, I would default to None or False.

(2) It would be GREAT if you could implement a writer as well. This makes testing really easier, and you can round-trip the data. Of course, I understand if this is not an immediate priority for you, and can be done later.

-- jeet

On 4/26/16 12:20 AM, Ben J Woodcroft wrote:

Hi there,

I was looking for a way to parse the jplace style newick-ish trees, described at http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0031009

which involves adding an edge numbering to the format e.g.

t1= Tree.get(data='((A:.01{0}, B:.02{1})D:.04{3}, C:.05{4}){5};',schema='newick')

print(t1.edges()[0].edge_number)

=> 5

print([ [number,edge.length]for number,edgein enumerate(t1.edge_index) ])

=> [[0, 0.01], [1, 0.02], [2, 0.04], [3, 0.05], [4, None]]

I've not included any test code or anything because I'm new here, and I wanted to see if I implemented it in a way you approve of first.

Thanks, ben


    You can view, comment on, or merge this pull request online at:

https://github.com/jeetsukumaran/DendroPy/pull/49

    Commit Summary

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/jeetsukumaran/DendroPy/pull/49


Jeet Sukumaran

jeetsukumaran@gmail.com

Blog/Personal Pages: http://jeetworks.org/ GitHub Repositories: http://github.com/jeetsukumaran Photographs (as stream): http://www.flickr.com/photos/jeetsukumaran/ Photographs (by galleries):

http://www.flickr.com/photos/jeetsukumaran/sets/

wwood commented 8 years ago

Hey, thanks for the tips - I've now implemented them and added some tests. How'd I do? I decided to default to not parsing since these are not newick format trees. Maybe could add a message to the user when a stray { is detected and jplace parsing is disabled?

The writer might come - presumably not much code change needed to write them out, I would just need to spend some time familiarising myself with the code; currently I've not even looked at the writer code.

jeetsukumaran commented 8 years ago

Looks good! Thanks! I've merged this into "development-master" (aface4b) and it will be included in the next release. Appreciate your contribution!

wwood commented 8 years ago

excellent, thanks.