Open Zsailer opened 8 years ago
This is a wonderful idea! Fantastically wonderful. I've been looking for a lightweight, portable, visualization solution, and this would be it.
Would be great to have you work on it! Otherwise, I can try an implementation myself (I love the idea so much!).
If you want to work on it, let me know, and we can discuss organization and API.
For the former, I think it should go into the dendropy.dataio
hierarchy, maybe in its own module, dendropy.dataio.d3writer
?
Perhaps the API should follow, for e.g. that for the NewickWriter, in terms of what gets rendered (node labels, edge lengths etc.). Also, support for other rendering features (colors, node shapes, edge thicknesses, tree styles, what gets collapsed etc. etc.) should be planned in the API even if we do not get around to implementing it all?
Fantastic! I'd be happy to help!
If development would go faster through you, I don't mind doing code-review and branching off your work. Whatever works best for you! Otherwise, I'd be happy take a crack at it myself (hopefully sometime today or tomorrow).
Yeah, I agree it makes sense to have this API live in its own module in dendropy.dataio
. There is likely a format specified/standardized by Vega for this kind of data structure and all the rendering features you mentioned. This might be a good place to start planning the organization. Vega was started with D3 in mind. I'll look through their docs for ideas.
Great!
I think given my familiarity with the codebase, one approach would be for me to set up all the "scaffolding" --- i.e., the hooks into the data schema API, the basic class to handle the writing etc., and leave the main "write" method as a stub to be fleshed out. Then, if you want and have the time, you can work at translating the tree structure into the required JSON. I can work on this over this weekend or next week. In the mean time, if you are agreeable, maybe get familiar with the Vega/D3 API, features, etc. (if you are not already)?
On 3/31/16 12:33 PM, Zachary Sailer wrote:
Fantastic! I'd be happy to help!
If development would go faster through you, I don't mind doing code-review and branching off your work. Whatever works best for you! Otherwise, I'd be happy take a crack at it myself (hopefully sometime today or tomorrow).
Yeah, I agree it makes sense to have this API live in its own module in |dendropy.dataio|. There is likely a format specified/standardized by Vega https://github.com/vega/vega for this kind of data structure and all the rendering features you mentioned. This might be a good place to start planning the organization. Vega was started with D3 in mind. I'll look through their docs for ideas.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/jeetsukumaran/DendroPy/issues/46#issuecomment-204011499
Blog/Personal Pages: http://jeetworks.org/ GitHub Repositories: http://github.com/jeetsukumaran Photographs (as stream): http://www.flickr.com/photos/jeetsukumaran/ Photographs (by galleries):
Sounds good! Ping me when you have the basic hooks in place. I'll work on the JSON format and post some ideas here.
Will do!
On 3/31/16 1:18 PM, Zachary Sailer wrote:
Sounds good! Ping me when you have the basic hooks in place. I'll work on the JSON format and post some ideas here.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/jeetsukumaran/DendroPy/issues/46#issuecomment-204031556
Blog/Personal Pages: http://jeetworks.org/ GitHub Repositories: http://github.com/jeetsukumaran Photographs (as stream): http://www.flickr.com/photos/jeetsukumaran/ Photographs (by galleries):
Ok,
I've put the scaffolding in place:
https://github.com/jeetsukumaran/DendroPy/blob/d3writer/dendropy/dataio/d3writer.py#L148-L160
The _write_tree_list()
is for any overhead/meta stuff required for a
group of trees, while _write_tree()
works on a single tree. You can
see the corresponding methods in the NewickWriter class for examples on
how this is handled.
Test framework in place at:
https://github.com/jeetsukumaran/DendroPy/blob/d3writer/dendropy/test/test_dataio_d3_writer.py
Test design is going to take some thinking. Typically, the approach has been to round-trip read-write-read, and then confirm that the objects of second reading semantically correspond to the objects of the first reading. Lots of infrastructure to support this.
With a write-only paradigm here, we might have to do a brute-force / dumb approach, i.e., check if the generated strings match exactly what is expected. This works, but is fragile -- i.e., non-semantic changes in the rendering pipeline will break the test (e.g., placement of spaces, newlines, etc.). But that's not a deal-breaker, I suppose, being only majorly annoying in the main development phase and usually easily-fixable. I am open to other suggestions if you have any.
I might find time to work on the actual D3 composition implementation next week or later. If you want to give it a go in the mean time, that would be great!
-- jeet
On 3/31/16 1:18 PM, Zachary Sailer wrote:
Sounds good! Ping me when you have the basic hooks in place. I'll work on the JSON format and post some ideas here.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/jeetsukumaran/DendroPy/issues/46#issuecomment-204031556
Blog/Personal Pages: http://jeetworks.org/ GitHub Repositories: http://github.com/jeetsukumaran Photographs (as stream): http://www.flickr.com/photos/jeetsukumaran/ Photographs (by galleries):
Awesome, thanks for getting that in place! I've forked the d3writer
branch and should get some time to work on it today/tomorrow.
Hi @jeetsukumaran
Sorry for the long delay on this. The summer proved to be a busy time for me. But I was able work on this idea yesterday. I've even got a prototype branch here.
This is far from finished. Note, I haven't added most of the keyword arguments, nor have I written tests. I was mostly familiarizing myself with DendroPy's data I/O API.
I did make a simple, static example of my working branch visible in this gist.
Currently, I construct a nested dictionary for a tree. This basically converts the Tree object into a hierarchical, metadata dictionary. Then, I use python's standard json
library to encode the metadata as a JSON string. This library handles all None
-to-null
, True
-to-true
, and False
-to-false
conversions.
It appears that there is not yet a clear, standardized JSON/Vega grammar defined for hierarchical, tree-like data. There is conversation going on between the "Open Tree of Life" group. In summary, Vega is starting to make a point to flatten JSON formats for readability, which doesn't work well for hierarchical data. My opinion is that we stick to the examples from D3. Every node has a "name"
and "children"
key-value pair. The "children"
value is an array of child nodes. Child nodes have a "parent"
argument pointing back to the parent name. I think, at the very least, this basic structure is acceptable to the general user-base:
{
"name" : "A",
"children" : [
{
"name" : "B",
"children" : [],
},
{
"name" : "C",
"children" : [],
},
]
}
Other items can be specific to DendroPy, but don't affect the generality of the output. For example, annotations, if not suppressed, are included as an "annotations"
key mapped to a set of key-value pairs for each annotation. Lengths of branches, if not suppressed, are included as "length"
in each child node's data.
Finally, as a small aside: I'm thinking we should rename the writer to JSONWriter
instead of D3Writer
. I think this data type of output is more general than just D3. It has other advantages like portability to other languages. I also find JSON format more human readable compared to other tree formats. It doesn't matter to me too much. If you'd rather keep the focus on D3, I'm fine with that as well.
Great stuff!
Really like what is happening here.
I agree with you that readability is nice, but really, really, really, really, really, should not take priority over usability. And there is strong reason to keep hierarchical data hierarchical. At the same time, we are (presumably?) not out to create a new data format, but rather render the data model in a format that can be consumed by an existing visualization technology, and it makes sense to use the format/conventions/standards/expectations of the visualization technology that we are targeting to condition output. TL;DR: I agree, stick to D3 examples!
As far as the naming goes, given that I imagine at some point we would like to take advantage of some D3-specific expression capability, my suggestion would be to have a JsonWriter
class that handles all generic JSON stuff, and a D3JsonWriter
that specializes it. Client code would then specify schema="json-d3"
or schema="d3"
(for example; we can decide the name later) to render the tree as D3-specific JSON and schema="json"
for more generic JSON (if we want to support that). The D3JsonWriter
would, of course, over-ride the _write
etc. methods as needed, and also call on the base class _write
as needed. I think the addition of the class hierarchy complexity is offset by the gains in modularity, abstraction, and DNRY-ness?
But that is just my suggestion. If you feel that simply renaming it "JSONWriter
" makes more sense (with, maybe, the optional specification of a keyword dialect="D3"
to activate D3-specific rendering), then that would be the way to go!
Yes, I definitely agree that we aren't trying to create a new data format (haha). I'm just surprised that there isn't a defined "tree" grammar for JSON format already out there (at least not that I could find immediately). It seems like there are so many great visualization tools that are prime for such a grammar. By including such a writer in DendroPy, we might be inadvertently contributing to creation of such a grammar.
I agree with keeping the hierarchy in the output, and D3 seems to honor that as well.
I really like the idea of subclassing a JsonWriter
class. I'll add that to my next implementation. I think D3 is one use-case (and likely most popular use-case) of a JSON format. It would great to connect DendroPy to fresh visualization tools like D3. A lot of these tools are written as Javascript libraries, so JSON is the natural porting mechanism. A subclassed writer would likely include extra visualization attributes (i.e. window size, colors, etc). The more general JSON format, however, would be useful for porting DendroPy tree data to other APIs or languages. I'm saying this for selfish reasons ;)
Thanks for talking through this stuff! I'll keep working on it and keep you updated!
Also, in the interest of making a general JSON format that is portable, would a JsonReader
class make sense as well? Or do you think this is outside the scope of DendroPy?
A reader would indeed be nice. I imagine the use case would be more limited, especially if the JSON is narrowly defined (DendroPy/D3-specific)? Though, having a reader will be useful for tests (round-tripping). Just as relevant, with projects like this, it is not always necessary to stick exclusively to what is important/needed/useful; I always tend to work in things that I like/want, even if it is very idiosyncractic and done more for interest rather than utility. So if the idea of writing a reader appeals to you, go for it!
WRT to JSON tree grammar/data format, the OTOL folks are using a JSON-based derivation of NeXML. I've been meaning to write a DendroPy parser for it, but it's been on the back-burner. Not saying that it should be used here, but mentioning it for reference or source of ideas.
Hey! @jeetsukumaran
I wanted to mention a new project I've been working on here. PhyloVega is a Python package that uses Vega's (JSON) specifications to draw interactive trees.
While writing this package, I (finally) figured out the Vega grammar for drawing trees. I think it's pretty powerful. With Vega's declarative grammar, I can style my tree any way I want. In this example below, I read in a tree using [PhyloPandas]() and style the tree using a declarative grammar API. Underneath the hood, the TreeChart object is just building a JSON spec for Vega.
from phylopandas import read_newick
from phylovega.api import TreeChart
# Read tree using PhyloPandas
df = read_newick('tree.newick')
# Construct Vega Specification
chart = TreeChart(
df,
height_scale=200,
# Node attributes
node_size=200,
node_color="#ccc",
# Leaf attributes
leaf_labels="id",
# Edge attributes
edge_width=2,
edge_color="#000",
)
# Display in Jupyter
chart.display()
This may be an interesting light-weight visualization solution for DendroPy. If I can find some time, I'll write up documentation for this grammar. Then, it would be pretty easy to write a JSON I/O tool for DendroPy. What do you think?
Hi Zach,
That sounds great. I think it makes most sense, though, for the
visualization to be handled by your library, and I can work on making
sure the object model and API allows for seamless interaction. That way,
there is no duplication of effort and all it takes from a DendroPy
user's perspective is "import phylovega; phylovega.
-- jeet
On 07/16/2018 06:02 PM, Zachary Sailer wrote:
Hey! @jeetsukumaran https://github.com/jeetsukumaran
I wanted to mention a new project I've been working on here. PhyloVega https://github.com/Zsailer/phylovega is a Python package that uses Vega's (JSON) specifications to draw interactive trees.
While writing this package, I (finally) figured out the Vega grammar for drawing trees. I think it's pretty powerful. With Vega's declarative grammar, I can style my tree any way I want. In this example below, I read in a tree using PhyloPandas and style the tree using a declarative grammar API. Underneath the hood, the TreeChart object is just building a JSON spec for Vega.
from phylopandasimport read_newick from phylovega.apiimport TreeChart
Read tree using PhyloPandas
df= read_newick('tree.newick')
Construct Vega Specification
chart= TreeChart( df, height_scale=200,
# Node attributes node_size=200, node_color="#ccc", # Leaf attributes leaf_labels="id", # Edge attributes edge_width=2, edge_color="#000",
)
Display in Jupyter
chart.display()
static-example https://user-images.githubusercontent.com/2791223/42785818-a77490dc-8908-11e8-9480-27b3ed85d41e.png
This may be an interesting light-weight visualization solution for DendroPy. If I can find some time, I'll write up documentation for this grammar. Then, it would be pretty easy to write a JSON I/O tool for DendroPy. What do you think?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jeetsukumaran/DendroPy/issues/46#issuecomment-405395657, or mute the thread https://github.com/notifications/unsubscribe-auth/AABmRw-YpwX4iDOWlJUlSxLAS6tGHp8bks5uHQ1sgaJpZM4H84VZ.
--
DendroPy is awesome.
I would love to see DendroPy's Tree data structures access modern tree visualization tools like D3. A simple method for writing the Tree data structure to a JSON schema would do the trick. I'm imagining JSON that follows the format described in this post.
I'd be happy to work on a PR if y'all agree it would be useful.