talgalili / heatmaply

Interactive Heat Maps for R Using plotly
377 stars 73 forks source link

row/column label overplotting #78

Open alanocallaghan opened 7 years ago

alanocallaghan commented 7 years ago

"The overall display needs adjustment after some zooming in/out practices. The row names overlay with each other when number of rows are increased (say more than 160)."

talgalili commented 7 years ago

The way to deal with this is if we could make the following work with dendrograms:

library(ggplot2)
library(plotly)
library(heatmaply)

heatmaply:::ggplot_heatmap(as.matrix(iris[,-5])) %>% ggplotly %>%
 layout(yaxis = list(tickmode='auto'),
              xaxis = list(tickmode='auto'))

I've asked about something similar here: https://github.com/ropensci/plotly/issues/1011

But I'm not sure how to make this work.

alanocallaghan commented 7 years ago

There is basically only one way to make it work; this would be to round the x co-ordinates of the segments of the dendrograms to allow them to fit on a categorical scale (ie, the same scale as the main heatmap).

This results in a non-conventional (but not wrong) dendrogram

screenshot from 2017-06-12 19-42-41

talgalili commented 7 years ago

mmm, interesting. Is there no change that Carson could introduce that would save us from this? (I fear it would make the users nervous).

alanocallaghan commented 7 years ago

It comes down to mixing categorical and numeric co-ordinate systems; you need to be able to provide common co-ordinates for the two axes. Thus, we need both to be numeric, and to manually specify the tick labels for the heatmap (the current default on master), or for both to categorical, and allow automatic labelling (the current implementation for this issue, when using plot_method="plotly" on branch tickmode.

I don't think Carson can change this; I'm pretty sure it's a limitation of the javascript library, since all of these parameters are passed directly to the javascript library. It might be worth raising the issue over at plotly.js HQ, but it's a bit of an edge case so might not have that much use or support.

alanocallaghan commented 7 years ago

Actually, it might be worth raising the issue re: sampling of axis labels on the plotly.js repo, though it won't be quick to be added I imagine

alanocallaghan commented 7 years ago

https://github.com/plotly/plotly.js/issues/1779

talgalili commented 7 years ago

I think it is worth making the case for it in the plotly repo (and even asking Carson to pay attention to it). This feature would help make the package more scalable for larger matrix while still keeping the visibility of the side labels usable.

alanocallaghan commented 7 years ago

As I said it's not a plotly R library issue; these are options which are passed directly to the underlying javascript, so it's in the hands of Etienne Pinard and co

talgalili commented 7 years ago

I meant the plotly.js repo :) (and asking Carson to pay attention was so that it would hopefully get more attention from Etienne).

cpsievert commented 7 years ago

Try using ggplotly(dynamicTicks = TRUE)

alanocallaghan commented 7 years ago

Not what I would term desired output screenshot from 2017-06-12 21-51-22

cpsievert commented 7 years ago

I'd consider this a bug in the R pkg. Should only be a problem when there is a colorbar and categorical axes.

cpsievert commented 7 years ago

see https://github.com/ropensci/plotly/issues/1045

talgalili commented 7 years ago

Hi @cpsievert , I've just updated to the latest plotly version which includes your patch, but this doesn't seem to solve the issue. I've added a new dynamicTicks argument to heatmaply so you could more easily debug this:

devtools::install_github('talgalili/heatmaply')
devtools::install_github("ropensci/plotly")

library(heatmaply)
# heatmaply(mtcars)
heatmaply(mtcars, dynamicTicks = TRUE)

Output:

image

cpsievert commented 7 years ago

I don't think this is a problem I can solve from plotly. You're using supplying different data for the x/y positions (discrete for the heatmap and numeric for the dendrogram). My advice would be to transform the dendrogram x/y to the corresponding discrete values when dynamicTicks = TRUE

alanocallaghan commented 7 years ago

@cpsievert, I agree that's the most sensible way to do this, and I've implemented this with plotly. However I'm still having some problems with ggplotly. A reprex is below which you can use with the develop branch on this repo if you have time.

## Working
heatmaply(iris[, -5], dynamicTicks=TRUE, plot_method="plotly")

## broken
heatmaply(iris[, -5], dynamicTicks=TRUE)

## For debugging, return the ggplot objects
h <- heatmaply(iris[, -5], dynamicTicks=TRUE, return_ppxpy=TRUE)

## Everything seems in order...
h$p
h$px
h$py

## Subplotting works!
subplot(h$py, plotly_empty(), h$p, h$px, nrows=2)

## Wrong, but not the kind of wrong we're seeing...
subplot(h$py, plotly_empty(), ggplotly(h$p, dynamicTicks=TRUE), h$px, nrows=2)

## Gotcha!
subplot(h$py, plotly_empty(), ggplotly(h$p, dynamicTicks=TRUE), h$px, nrows=2, shareX = TRUE, shareY = TRUE)

So the issue seems to be sharing axes... I think it was okay before since the axis types were different (categorical and numeric). Do I need to change the axis IDs to control which axes are changed?

@talgalili since I've now fixed this for plot_method="plotly" I'll probably call it a day for the revision unless we can make some progress on this.

cpsievert commented 7 years ago

So the issue seems to be sharing axes

Sure, but why should subplot() do the thing you're expecting it to do when h$p has discrete x/y scales and h$py/h$px has numeric x/y scales? You're trying to share two different scales...

alanocallaghan commented 7 years ago

I edited the dendrograms to use a discrete scale to match with the heatmap's discrete scale; admittedly I had left coord_cartesian()/coord_flip() calls in that might have been causing problems, but removing them hasn't helped.

pg <- ggplot_build(h$p)
pg$layout$get_scales()
# $x
# <ggproto object: Class ScaleDiscretePosition, ScaleDiscrete, Scale>
# [etc]

pxg <- ggplot_build(h$px)
pxg$layout$get_scales(1)
# $x
# <ggproto object: Class ScaleDiscretePosition, ScaleDiscrete, Scale>
# [etc]

Whereas for a numeric scale:

g <- ggplot(mtcars, aes(x=mpg, y=cyl)) + geom_point()
gb <- ggplot_build(g)
gb$layout$get_scales(1)
# $x
# <ScaleContinuousPosition>
# [etc]
# $y
# <ScaleContinuousPosition>
alanocallaghan commented 7 years ago

Also it's worth noting that subplot() does do the right thing, just not when dynamicTicks is enabled:

subplot(h$py, plotly_empty(), h$p, h$px, nrows=2, shareX=TRUE, shareY=TRUE)

cpsievert commented 7 years ago

Sorry, I meant to say the data that you're supplying to x/y are different:

unique(h$p$data$column)
#> [1] Sepal.Length Petal.Length Sepal.Width  Petal.Width 
#> Levels: Sepal.Length Petal.Length Sepal.Width Petal.Width
unique(h$py$layers[[1]]$data$x)
#> [1] 2.125 1.000 3.250 2.500 2.000 3.000 4.000

Thinking about this a bit more, I'm not actually sure a plot like this (with the auto-updating ticks) is possible in plotly.js right now. And to that point, I should mention that the dynamicTicks=TRUE bit in heatmaply(iris[, -5], dynamicTicks=TRUE, plot_method="plotly") does nothing. The reason why it 'works' is because you're using tickvals/ticktext, which is not "dynamic" in the sense that plotly.js doesn't automatically (re)generate labels from the supplied data (but you really don't want plotly.js auto-generating labels since, in your case, the dendrogram requires positions that are between categories).

It sounds like what you need is the ability to have something like nticks to be respected when tickmode='array'

talgalili commented 7 years ago

Thanks @cpsievert

I added "dynamicTicks=TRUE" as a place-holder to play with. Seeing the correspondence here, it seems clearer that this option will not be available also in the future, so I will later remove this argument.

As for your other remark - would making "nticks be respected when tickmode='array'" something that is on your end, or requires plotly.js modification?

cpsievert commented 7 years ago

It would have to be implemented at the plotly.js level

alanocallaghan commented 7 years ago

On branch develop it is working correctly, see the example below. I've mapped the heatmap and dendrograms to the same discrete scale. I've also made the hovertext on the dendrograms show x and y co-ordinates for clarity (the heatmap is already showing that). Also, when dynamicTicks=TRUE and plot_method="plotly", tickvals and ticktext are not specified on the heatmap or dendrograms.

devtools::install_github("talgalili/heatmaply", ref="develop")
library(heatmaply)
heatmaply(iris[, -5], dynamicTicks=TRUE, plot_method="plotly")
alanocallaghan commented 7 years ago

I would note that the y axis labels are numbers, but are character objects.

rownames(iris) <- paste("row", rownames(iris))
heatmaply(iris[, -5], dynamicTicks=TRUE, plot_method="plotly")
talgalili commented 7 years ago

@Alanocallaghan - it is also worth noting that the dendrograms are not "well" mapped to the centers as they should be:

image

(also there seems to be a space between the heatmap and the dendrogram)

I think that having the nticks work for tickmode='array'" sounds to be the more reasonable solution at this point (though, sadly, it would require someone to fix it in plotly.js...)

alanocallaghan commented 7 years ago

screenshot from 2017-06-23 17-50-11 They are mapped to the centers, unfortunately rounding makes the dendrograms hard to interpret near the leaves (not that that is easy usually, but I digress).

The space between the heatmap and dendrogram is a minor issue and could easily be resolved by changing axis limits, I just haven't addressed it as this is still not resolved for plot_method="ggplot", or for side bar plots.

talgalili commented 7 years ago

Hi Alan, thanks for the explanation. Could you please see if you have ideas on how to improve this issue: https://github.com/plotly/plotly.js/issues/1812 If it could be simpler for the plotly.js people, I hope it would raise the chances they could address it sooner rather than later.

alanocallaghan commented 7 years ago

I agree that the plotly fix is more sensible, however I am convinced that something is amiss in ggplotly. I don't see why the dynamic ticks argument should scramble the x/y co-ordinates, as shown above...

I am not optimistic about plotly.js fixing this in time for the revision, so my inclination would be to implement this bodge/hack as an option, and then roll out a proper fix when the nticks issue is addressed

@cpsievert running your code on the develop branch:

unique(h$p$data$column)
#[1] Sepal.Length Petal.Length Sepal.Width  Petal.Width 
#Levels: Sepal.Length Petal.Length Sepal.Width Petal.Width
unique(h$py$layers[[1]]$data$x)
#[1] Petal.Length Sepal.Length Sepal.Width  Petal.Width 
#Levels: Sepal.Length Petal.Length Sepal.Width Petal.Width