plotly / plotly.R

An interactive graphing library for R
https://plotly-r.com
Other
2.54k stars 622 forks source link

ggplotly not rendering geom_ribbon() correctly with NA values #1060

Open GeoCrunch opened 7 years ago

GeoCrunch commented 7 years ago

Hi, ggplotly() creates broken ribbons when there are NAs in the min or max values, although the ribbon is rendered correctly in ggplot alone.

This seems to be alluded to here but since that was 3 years ago I fear it may not be an open issue.

The below is sufficient to illustrate how the plots differ.

Thanks!

library(ggplot2); library(plotly); library(dplyr)
df = data.frame(x = 1:5, y = sin(1:5), min = sin(1:5) + 1, max = sin(1:5)-1) %>%
  rbind(c(6,sin(6),NA, NA))
pl = ggplot(df, aes(x=x, y=y)) + 
  geom_line() + 
  geom_ribbon(aes(min = min, max = max))
pl
ggplotly(pl)
chrMongeau commented 7 years ago

I struggled a bit with this, before realising it is an open issue. It seems that it generates unexpected results (i.e., bad ribbons display) only when there is missing data at the end or middle (in this case for the first subset of data some NAs would be at the end) of the dataset:

library(ggplot2)
library(plotly)
library(dplyr)

# Base dataset

df = data.frame(x = 1:6, y = sin(1:6), min = sin(1:6) + 1, max = sin(1:6)-1)

# One missing value at the beginning of the dataset

df_1 <- df

df_1$min[1] <- df_1$max[1] <- NA

pl_1 = ggplot(df_1, aes(x=x, y=y)) + 
  geom_line() + 
  geom_ribbon(aes(min = min, max = max), alpha = 0.1)

ggplotly(pl_1)

# One missing value at the end of the dataset

df_2 <- df

df_2$min[6] <- df_2$max[6] <- NA

pl_2 = ggplot(df_2, aes(x=x, y=y)) + 
  geom_line() + 
  geom_ribbon(aes(min = min, max = max), alpha = 0.1)

ggplotly(pl_2)

# One missing value in the middle of the dataset

df_3 <- df

df_3$min[3] <- df_3$max[3] <- NA

pl_3 = ggplot(df_3, aes(x=x, y=y)) + 
  geom_line() + 
  geom_ribbon(aes(min = min, max = max), alpha = 0.1)

ggplotly(pl_3)

# Two missing values at the beginning of the dataset

df_4 <- df

df_4$min[1:2] <- df_4$max[1:2] <- NA

pl_4 = ggplot(df_4, aes(x=x, y=y)) + 
  geom_line() + 
  geom_ribbon(aes(min = min, max = max), alpha = 0.1)

ggplotly(pl_4)

# Two missing values at the end of the dataset

df_5 <- df

df_5$min[5:6] <- df_6$max[5:6] <- NA

pl_5 = ggplot(df_5, aes(x=x, y=y)) + 
  geom_line() + 
  geom_ribbon(aes(min = min, max = max), alpha = 0.1)

ggplotly(pl_5)

# Two missing values in the middle of the dataset

df_6 <- df

df_6$min[3:4] <- df_6$max[3:4] <- NA

pl_6 = ggplot(df_6, aes(x=x, y=y)) + 
  geom_line() + 
  geom_ribbon(aes(min = min, max = max), alpha = 0.1)

ggplotly(pl_6)

Unfortunately I have no idea even on how to debug this.

LTribelhorn commented 3 years ago

This (or at least a very similar) problem arises not only when using ggplot::geom_ribbon() but also when adding the ribbons with plotly::add_ribbons().

Has anyone found a solution or a work-around for this?

lucalamoni commented 2 months ago

Hello, I have the same problem described above. When the time series has some NAs in the middle the ggplotly function does not display the data correctly. Do you know if this issue has been addressed already? Thank you