thomasp85 / ggraph

Grammar of Graph Graphics
https://ggraph.data-imaginist.com
Other
1.08k stars 116 forks source link

Preserve tip lengths in dendrogram (similarly to `plot.hclust()`) #368

Open almeidasilvaf opened 8 months ago

almeidasilvaf commented 8 months ago

Hi, @thomasp85

First of all, thank you for this great package!

I've been trying to use {ggraph} to create a dendrogram that preserves tip lengths as plot.hclust() does, but I can't seem to make it work. I was wondering if there is already an easy to do this with geom_edge_elbow().

In the reprex below, I demonstrate that I can do what I want with a hacky way by using data produced by {ggdendro}, but I don't like this approach. Is there a more elegant way of doing the same using {ggraph}?

suppressPackageStartupMessages(library(ggplot2))

# Simulate data
simdata <- matrix(
    rnorm(100 * 50, mean = 10, sd = 2),
    nrow = 100
)
cormat <- cor(t(simdata))
hc <- hclust(as.dist(1 - cormat), method = "average")

# 1) The base R way
plot(hc, labels = FALSE)


# 2) The {ggraph} way
p1 <- ggraph::ggraph(
    hc, layout = "dendrogram",
    height = .data$height
) +
    ggraph::geom_edge_elbow()

p1


# 3) The {ggdendro} way
p2 <- ggdendro::ggdendrogram(hc, labels = FALSE)
p2


# 4) The hacky way: Extracting x, y, xend, and yend coords with {ggdendro}
ddata <- ggdendro::dendro_data(hc, type = "rectangle")$segments
head(ddata)
#>           x         y      xend      yend
#> 1 39.324219 1.0234122 15.273438 1.0234122
#> 2 15.273438 1.0234122 15.273438 1.0006401
#> 3 15.273438 1.0006401  6.015625 1.0006401
#> 4  6.015625 1.0006401  6.015625 0.9593656
#> 5  6.015625 0.9593656  3.125000 0.9593656
#> 6  3.125000 0.9593656  3.125000 0.8316230

ddata$yend[ddata$yend < 0.05] <- ddata$y[ddata$yend < 0.05] - 0.05

p3 <- ggplot(ddata) +
    geom_segment(
        aes(x = .data$x, y = .data$y, xend = .data$xend, yend = .data$yend), 
        linewidth = 0.3
    ) +
    coord_cartesian(ylim = c(0, NA))

p3

Created on 2024-03-25 with reprex v2.1.0

Best, Fabricio

KlausVigo commented 8 months ago

Hi @almeidasilvaf, if you apply hclust the tree is ultrametric, that means that the distance from each tip to the root is equal. So p2 is actually preserving the edge lengths, while p1 is shortening external edges. The external edges in p1 are also all the same, but if you don't want the edges to be shortened, use

plot(hc, hang=-1)

In phylogenetic hclust(..., method = "average") is better known as UPGMA. plot.hclust additionally multiplies the edges by a factor of two, so interpretation of plot.hclust is different to phylogenetic trees and easy to mislead the reader relying on default values. Here in comparison the ape representation:

library(ape)
plot(as.phylo(hc))
axisPhylo()
max(cophenetic(hc))
# The distance of the root from the tips is half the cophenetic distance.

Kind regards, Klaus

almeidasilvaf commented 8 months ago

Hi, @KlausVigo

That's interesting, I didn't know that hclust() generated ultrametric trees by default. In my case, I really want to generate plots like plot.hclust() does (with shortened external edges) from hclust objects using the ggplot2 system. Would you know a way of extracting the coordinates that plot.hclust() uses?

I could also rely on phylo objects (although, in my case, I am not working with phylogenies, it's really hierarchically clustered dissimilarity matrices) and create the plots with the {ggtree} package. The problem is that {ggtree} is extremely heavy and can easily lead to a huge increase in complexity in the dependency graph of the package where I want to implement this functionality.

Best, Fabricio