Open binkleym opened 1 month ago
Simplified reprex:
library(ggplot2)
ggplot(mpg, aes(displ, hwy, colour = drv)) +
geom_point() +
scale_colour_manual(values = c("red", "blue", NA))
The simple solution would be to use "transparent"
instead of NA
. Or any transparent colour like "#FFFFFF00"
.
ggplot(mpg, aes(displ, hwy, colour = drv)) +
geom_point() +
scale_colour_manual(values = c("red", "blue", "transparent"))
Created on 2024-06-03 with reprex v2.1.0
I think the scale is doing exactly what is supposed to do by translating NA
to na.value
.
The above is contrary to ggplot2's own documentation:
https://ggplot2.tidyverse.org/reference/aes_colour_fill_alpha.html
"An NA, for a completely transparent colour."
By the docs, NA should select transparency, not "gray50". Which I'm not asking to revert, I'm just asking that info be printed when NA's are overridden so that people don't spend hours debugging when default behavior changes.
@teunbrand Not sure I follow:
I think the scale is doing exactly what is supposed to do by translating
NA
tona.value
.
In your reprex, NA
is not a data value but a color value, i.e. it is being used after translation from data to color. Why should it be converted into na.value
? At least conceptually this seems wrong.
It's a question of ordering of operations. Do you first convert all NA
s into na.value
and then convert all other data values into the respective colors or do you do it the other way round?
I haven't looked at the code, it may be a minor fix changing ordering of operations or it may be a major headache and not worth the cost to fix, but I agree it feels wrong to me the way it currently works.
Ok, seems like the issue is that the code uses NA
in pal_match
as an indication that the respective data value was not found in the palette and hence it cannot distinguish this scenario from cases where there was a match but the match returned NA
.
https://github.com/tidyverse/ggplot2/blob/28aec3a90d2798382c7da384fece6227e12092f8/R/scale-.R#L985
Definitely not correct behavior in my opinion but maybe not worth the effort to fix.
Why should it be converted into na.value?
If it shouldn't, then na.translate
should be set to FALSE
.
The discrete palette NA
handling code lives here:
Briefly, pal_match
is a translation of user values to palette values. It can be NA
because the palette contains NA
(as in this issue) or it can be NA
because there is no match between the data values and limits. The code doesn't discriminate between these two cases.
If it shouldn't, then
na.translate
should be set toFALSE
.
According to the documentation, na.translate
is about missing data values, not about colors that have been specified as NA
s. Again, I don't think this is worth fixing, but I continue to agree the behavior is not quite correct. The simplest fix might be to just tell people in the documentation to not specify colors as NA
but instead use "transparent"
.
(Simple thought experiment: I might want to translate NA
s in the data into na.value
and also use transparency for some other, non-missing data value. Clearly in this case I need na.translate = TRUE
.)
The docs of na.translate
just read 'missing values' without specifying 'data values' or 'mapped values'.
I'm not entirely convinced that the current behaviour is wrong: people might also be setting NA
s in palettes for the purpose of defaulting to na.value
and guessing user intent is hard from the available information.
I think adjusting the docs to not recommend colour = NA
but use colour = "transparent"
instead is perhaps the least invasive method of resolving this.
Here is an example to show my thought experiment is not completely contrived. The only working solution is to specify the transparent color as "transparent"
. Either way I think we agree the docs should be changed.
library(tidyverse)
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
texas_income <- readRDS(url("https://wilkelab.org/SDS375/datasets/Texas_income.rds"))
p <- texas_income %>%
mutate(
top_ten = if_else(is.na(median_income), NA, rank(desc(median_income)) <= 10)
) %>%
ggplot(aes(fill = top_ten)) +
geom_sf() +
theme_bw()
# works
p +
scale_fill_manual(
breaks = c(TRUE, FALSE, NA),
labels = c("top 10 income", "other", "no data"),
values = c(`TRUE` = "#D55E00", `FALSE` = "transparent")
)
# doesn't work
p +
scale_fill_manual(
breaks = c(TRUE, FALSE, NA),
labels = c("top 10 income", "other", "no data"),
values = c(`TRUE` = "#D55E00", `FALSE` = NA)
)
# doesn't work either
p +
scale_fill_manual(
breaks = c(TRUE, FALSE, NA),
labels = c("top 10 income", "other", "no data"),
values = c(`TRUE` = "#D55E00", `FALSE` = NA),
na.translate = FALSE
)
Created on 2024-06-03 with reprex v2.0.2
I think adjusting the docs to not recommend colour = NA but use colour = "transparent" instead is perhaps the least invasive method of resolving this.
My problem is already solved, but for other R users who might hit this issue because the default behavior has changed, I still think printing something that lets them know NA's are being overridden is the right thing. Changing documentation doesn't change existing code bases, nor does it inform users used to the older behavior that things have changed.
By letting users know what's changed and how to fix it, you get the new behavior, and they don't spend hours debugging. It's win-win for everyone.
@binkleym I suspect that's a completely different issue. Either way, can you use my example to make a simple reprex?
Life suddenly got busy, but I will as soon as I can.
I encountered a problem with scale_fill_manual where ggplot was explicitly ignoring the NA (for transparency) I was attempting to use.
The problem appears to be this commit, where the default value for NA's is set to "gray50". Instead of trusting that the user knows what they're doing when they set NA for transparency, it overrides them behind their back without letting them know.
https://github.com/tidyverse/ggplot2/commit/c9adeeddca2ecba9162f231a5c5d326f103e691b
It was quite maddening to try to figure out what was going on, since scale_fill_manual() changes the behavior silently.
It would be useful (and merciful on anyone trying to debug similar things) to have scale_fill_manual() spit out a message on NA's similar to
Code to demonstrate issue:
The image on the left shows the incorrect behavior. scale_fill_manual() is silently replacing my NA (transparency) with 'gray50'.
The image on the right shows how it should look. scale_fill_manual() does what I tell it to do, and renders transparency as actual transparency.