Open gregcaporaso opened 10 years ago
And I think from_linkage
is higher priority than to_linkage
as we have an immediate use-case for the former. If the latter is easy, we should just do it now, but if it would hold up getting these changes into 0.1.0, we should do to_linkage
at a later time.
Looks like we never did to_linkage
. This would be really useful, as it'd let us plot with the SciPy dendrogram function, which has problems (#531) but is better than ascii_art
. See this recipe for an example of why this would be good.
@ebolyen, would you be able to do this since you did the from_linkage
method?
Adding the to_linkage
method would be extremely helpful. For example, it would be really nice to pass a linkage matrix into seaborn.clustermap
to align a tree with a heatmap.
Agree, is anyone available to work on to_linkage
?
I'll need this for a paper anyways, so I'll probably have something ready within a month.
Awesome! Is it okay if I assign this issue to you?
Sure thing.
On Thu, Aug 11, 2016 at 3:42 PM, Jai Ram Rideout notifications@github.com wrote:
Awesome! Is it okay if I assign this issue to you?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore/scikit-bio/issues/346#issuecomment-239269210, or mute the thread https://github.com/notifications/unsubscribe-auth/AD_a3UOWPW5CeO5IcltUBJ475GbpTlW1ks5qe3sOgaJpZM4B67pq .
This will be really great to get in there.
Hi @gregcaporaso @jairideout . I've been looking at this more carefully.
It turns out that this is a really hard problem.
Sorry to be the bearer of bad news - but it looks like we would lose the branch length information if we cast to this object. But if we really wanted this, it is probably possible to do this cast by setting all of the distances to 1 or something. Even with this scheme, it is tricky to figure out how to get the right matrix, given the traversal issue.
@mortonjt, branch lengths can be encoded, and siblings do not have to have identical distances to their parents (except for tips), however the tree does have to be an ultrametric binary tree (strictly bifurcating with every tip being equidistant to the root).
The linkage isn't exactly a reversed depth-first search, it actually is ordered by shortest subtree (you should see distances increase as you traverse the linkage matrix as the distance recorded represents twice the tip to root so far, or the distance between two "clusters").
I think we should have some way of validating that a given tree is bifurcating and ultrametric, and raise an error otherwise.
@ebolyen you are right - this method will only work properly on ultrametric bifurcating trees. I didn't know about the shortest subtree ordering. Thanks!
This may of interest in the future -- particularly now that seaborn remains to be the best plotting program for dendrogram heatmaps.
The scipy linkage object is returned by the scipy hierarchical clustering methods, so being able to load or generate these from TreeNode's provides the glue between scipy's hierarchical clustering and skbio's TreeNode.
@ebolyen is working on this. @wasade, would you be available to review?