plotly / plotly.R

An interactive graphing library for R
https://plotly-r.com
Other
2.55k stars 625 forks source link

Error bars displayed in incorrect order with color attribute assigned #762

Open saudiwin opened 8 years ago

saudiwin commented 8 years ago

When error_x is used along with a factor variable assigned to the color attribute in plot_ly, the resulting plot displays the error bars around the wrong points. Below is a working example using source code from the plotly for R book. You can see in the plot that the standard errors are being mapped to the wrong coefficients.

m <- lm(Sepal.Length ~ Sepal.Width * Petal.Length * Petal.Width, data = iris)

d <- broom::tidy(m) %>% arrange(desc(estimate)) %>% mutate(term = factor(term, levels = term), one_col=cut(estimate,3,labels=c("Low","Medium","High"))) plot_ly(d, x = ~estimate, y = ~term,color=~one_col) %>% add_markers(error_x = ~list(array=std.error,type="array")) %>% layout(margin = list(l = 200))

harveyl888 commented 6 years ago

I came across this too. One workaround is to add each sample group via add_trace. The code below demonstrates this. Error bars in the first plot (p1) are incorrectly assigned whereas error bars in the second (p2) are correct.

library(plotly)
library(dplyr)
library(tidyr)

## Raw data
df <- data.frame(sample = rep(paste0('sample ', 1:5), 4),
                 x = rnorm(20),
                 group = rep(paste0('group ', 1:2), each = 10),
                 stringsAsFactors = FALSE
                 )

## Stats table
df2 <- df %>%
  group_by(sample, group) %>%
  summarise(avg = mean(x), sd = sd(x)) %>%
  ungroup()

## Plotly barchart with error bars.  Error bars are incorrectly assigned
p1 <- plot_ly(df2, x = ~sample, y = ~avg, color = ~group, type = 'bar', error_y = list(array = ~df2$sd))
p1

## Create individual columns for group data and errors
df3 <- df2 %>%
  gather(key, value, -c(sample, group)) %>%
  mutate(ref = paste0(group, ifelse(key == 'sd', '_sd', ''))) %>%
  select(-group, -key) %>%
  spread(ref, value)

## Plotly barchart displays error bars correctly
p2 <- plot_ly(df3, type = 'bar')
for (g in unique(df2$group)) {
  p2 <- add_trace(p2, x = df3[['sample']], y = df3[[g]], name = g, error_y = list(array = df3[[paste0(g, '_sd')]]))
}
p2
erfanv commented 5 years ago

Is this still the most viable solution for getting error bars to show correctly still? I have run into a similar situation with grouped time-series data. The y-error bars are not being associated with the correct bar.

Attached is the data and below is the code I'm running:

(will need to convert XLSX to csv) nymphs <- read.csv("nymphs.csv", header=TRUE")

ptest <- plot_ly(nymphs, x = ~week, y = ~nymphs, type= "bar", color = ~treat, error_y = ~list( array= se, type="array", color="#000000")) %>% layout(xaxis= list(title="WAT"), yaxis = list(title= "Means nymphs per plant"))

Below is a screenshot that I'm getting - the large error bars should be on "WAT 10", where the bars are much larger as well. Any help would be greatly appreciated - can't find documentation addressing this issue anywhere.

Screen Shot 2019-08-19 at 5 28 44 PM

nymphs.xlsx

ho-stil commented 5 years ago

I was running into this issue when I wanted to filter production data with crosstalk on manufacturing parameters/dates and label the scatterplot by one main parameter. To get the position of the error bars fixed I modified the workaround by harveyl888 (thanks for the input). My modified version for crosstalk with model-data (I have chosen ordered instead of randomized data to get an idea of the structure behind the issue):

library(crosstalk)
library(plotly)
TestDF <- data.frame(a = c(1:9), b = c(rep(c(1:3), 3)), c = c(1:9/10), d = LETTERS[rep(c(1:3),3)])
d_list <- unique(TestDF$d)
shared_Test_all <- SharedData$new(TestDF, group = "Test")
shared_Test <- list()
p1 <- plot_ly(type = "scatter", mode = "markers")  
for (i in 1:length(d_list)) {
  shared_Test[[i]] <- SharedData$new(TestDF[TestDF$d == d_list[i],], group = "Test")
  p1 <- add_trace(p1, data = shared_Test[[i]], x = ~a, y = ~b, name = ~d, error_y = ~list(array = c))
}
bscols(list(filter_select("d", "filter by d", shared_Test_all, ~d, multiple = TRUE),
            bscols(plot_ly(data = shared_Test_all, x = ~a, y = ~b, 
                           type = "scatter", mode = "markers",
                           error_y = ~list(array = c)),
                   plot_ly(data = shared_Test_all, x = ~a, y = ~b, name = ~d, 
                           type = "scatter", mode = "markers",
                           error_y = ~list(array = c)),
                   p1)))

The resulting plots:

grafik

The 1st plot is without labels for reference - the error bars are at the right position. The 2nd plot shows the bug for the model-data: the error bar increases with the parameter "a" within the group first, instead of being "a/10" for all points. Looks like the order of the errorbar values remains like without groups while they are matched to the scatterplot folowing the order within the groups... My workaround (3rd plot): I used a list of SharedData-environments - one list element for each group of labels. I grouped the environments for common filtering.

bd-cameron-willden commented 8 months ago

Bump. Please fix this

RichGordon90 commented 4 months ago

Bump again - this is v cumbersome workaround.