tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.5k stars 2.02k forks source link

Data with alot of zeros, a boxplot and scale log2 fail #247

Closed apepper closed 12 years ago

apepper commented 13 years ago

Hi there. When I have a dataframe, that contains alot of zeros so all quartiles are zero and no outliers exists, ggplot2 fails when trying to run it with a log2 scale. Without the log2 scale it works fine. I'm running R version 2.13.2 and ggplot2 Version: 0.8.9.

Minimal Example:

library(ggplot2)

DF <- data.frame(
  time = factor(c(1,2,1,2,1,2,1,2,1,2)),
  value = c(0,0,0,0,0,0,0,0,0,10))

ggplot(DF, aes(time, value)) + geom_boxplot() + scale_y_log2()

Error Message (parts in german):

Fehler in if (any(outliers)) stats[c(1, 5)] <- range(y[!outliers], na.rm = TRUE) : 
  Fehlender Wert, wo TRUE/FALSE nötig ist
hadley commented 13 years ago

What behaviour do you expect? Log scales don't work with zeros.

apepper commented 13 years ago

What I found so far: 2^x can never be zero ("simple" math). So it outliers in StatBoxplot->calculate is NA NA NA NA NA.

So this is more a general question: How to deal with zeros in set that should be logarithmic?

BrianDiggs commented 13 years ago
> log2(0)
[1] -Inf

It is impossible to draw a finite scale that goes to infinity. If a set has zeros, then a logarithmic transformation does not make any sense. Perhaps with context, an alternative can be suggested, but this is not a bug. Consider posting on the ggplot mailing list or r-help list for general discussion of this problem.

apepper commented 13 years ago

I see that this is more a math problem than a ggplot2 problem. But one thing, that could be improved is the error-message to know, that log2(0) is not a good idea. The current error message is quite cryptic (Error in if (any(outliers)) stats[c(1, 5)] <- range(y[!outliers], na.rm = TRUE))