teunbrand / ggh4x

ggplot extension: options for tailored facets, multiple colourscales and miscellaneous
https://teunbrand.github.io/ggh4x/
Other
534 stars 32 forks source link

stat_difference - when specifying levels: Why NA1 and NA2? #84

Closed tjebo closed 1 year ago

tjebo commented 1 year ago

Hi Teun. I ran across this slightly odd behaviour and don't really know what to make of it. I guess it's no bug, but just curious as to the whys and wherefores. No worries if you have no time to answer, I just felt like documenting the question. When specifying the levels in stat_difference, why is it showing "NA1" and "NA2"? To me, a missing value is a missing value, and giving it a level kind of gives it some importance that it might not have.

Of another conceptual note, I am not sure about the meaning of "0" in the fill scale - there will never be a fill in this case, right?? (correct me if I am wrong!) So why representing it in the scale (in blue)?

library(ggh4x)
#> Loading required package: ggplot2
df <- data.frame(x= 1:5, ymin = 1:5, ymax = 5:1)

ggplot(data = df, aes(x)) + 
  stat_difference(aes(ymin = ymin, ymax = ymax)) 


ggplot(data = df, aes(x)) + 
  stat_difference(aes(ymin = ymin, ymax = ymax), levels = c("+", "-")) 


ggplot(data = df, aes(x)) + 
  stat_difference(aes(ymin = ymin, ymax = ymax), levels = c("+")) 

Created on 2022-11-24 with reprex v2.0.2

teunbrand commented 1 year ago

Hi Tjebo!

To be honest, I don't really recall why I've chosen it to be like this. Thinking back, I think it might have something to do that people might, in some weird specific circumstance, have need of the 0 level. For example if you want the colour to depend on the sign:

library(ggh4x)
#> Loading required package: ggplot2
df <- data.frame(
  x= 1:5, 
  ymin = c(1,2,2,2,3), 
  ymax = c(3,2,2,2,1)
)

ggplot(data = df, aes(x)) + 
  stat_difference(
    aes(
      ymin = ymin,
      ymax = ymax,
      colour = after_scale(fill)
    )
  ) 

Created on 2022-11-24 by the reprex package (v2.0.1)

You're right that this is probably a rare use case, and now I'm wondering if I should add a drop_equal argument that defaults to TRUE, to get rid of the 0 in most cases.

With regards to the NA1 and NA2 levels, I just figured that something must represent missing levels and I wasn't quite sure what. If you have a good idea about how missing levels should be handled, I'd be glad to hear it!

Best, Teun