tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.55k stars 2.03k forks source link

Vignette ggplot2-in-packages does not cover programmatically passing NULL to aes() #6208

Open kkellysci opened 2 days ago

kkellysci commented 2 days ago

I found a problem with the ggplot2-in-packages vignette.

I expected it to cover how to fully replace all behaviors from the now-deprecated aes_string() function. However, both of its recommended solutions (the .data pronoun and the embrace operator) have limitations that make them difficult to use to replace aes_string(), and the vignette does not cover how to work around these limitations.

Here is some test data for demonstrations:

set.seed(1)
plotdat <- data.frame(height=rnorm(10,mean=160,sd=10),weight=rnorm(10,mean=70,sd=10))
plotdat$sex <- sample(c('F','M'),10,replace=T)
plotdat$age <- round(rnorm(10,mean=40,sd=20))
plotdat$employment <- sample(c('Employed','Unemployed','Retired'),10,replace=T)

The problem with using the .data pronoun is that the .data pronoun cannot handle situations where an aesthetic is set to NULL to indicate it should not be used:

plot_data_pronoun <- function(dat,xvar,yvar,colorvar=NULL,shapevar=NULL) {
  myplot <- ggplot(dat,aes(x=.data[[xvar]],y=.data[[yvar]],color=.data[[colorvar]],shape=.data[[shapevar]])) +
    geom_point()
  return(myplot)
}

plot_data_pronoun(plotdat,'weight','height',colorvar='employment',shapevar=NULL)

# This fails with:
# Error in `geom_point()`:
#  ! Problem while computing aesthetics.
# Error occurred in the 1st layer.
# Caused by error in `.data[[NULL]]`:
#  ! Must subset the data pronoun with a string, not `NULL`.

The embrace operator can handle NULL, but the problem with using the embrace operator is that it does not work when the variable name (or NULL) that will be used for an aesthetic is determined when the code runs and cannot be written into the function call:

plot_embrace <- function(dat,xvar,yvar,colorvar=NULL,shapevar=NULL) {
  myplot <- ggplot(dat,aes(x={{ xvar }},y={{ yvar }},color={{ colorvar }},shape={{ shapevar }})) +
    geom_point()
  return(myplot)
}

whichvar <- sample(c('sex','employment'),1)
plot_embrace(plotdat,weight,height,colorvar=age,shapevar=whichvar) # This produces an incorrect plot that does not use the column specified by "whichvar"

Both of these scenarios could be handled easily and seamlessly by aes_string():

plot_aes_string <- function(dat,xvar,yvar,colorvar=NULL,shapevar=NULL) {
  myplot <- ggplot(dat,aes_string(x=xvar,y=yvar,color=colorvar,shape=shapevar)) + geom_point()
  return(myplot)
}

# Scenario 1: Passing NULL for an unused aesthetic
plot_aes_string(plotdat,'weight','height',colorvar='employment',shapevar=NULL) # This works

# Scenario 2: The variable used for shape isn't determined until the code runs
whichvar <- sample(c('sex','employment'),1)
plot_aes_string(plotdat,'weight','height',colorvar='age',shapevar=whichvar) # This works

# Scenario 3: Both things at once
whichvar <- NULL
plot_aes_string(plotdat,'weight','height',colorvar='age',shapevar=whichvar) # This works

Please consider updating the ggplot2-in-packages vignette to demonstrate how tidy evaluation idioms can be used to allow a function which uses ggplot() to handle both of these scenarios.

(I am submitting this as a bug report rather than a StackExchange question because, while I can figure out a duct-tape work around for these issues, I think it is important that the ggplot authors' intended/best practice solution to these issues be included in the official documentation. Or if there is currently no best practice solution that can handle both scenarios, there may be a need to create one to help users move away from aes_string().)

kkellysci commented 2 days ago

Here is the duct-tape work around I came up with, based on some of the solutions people posted in this StackExchange question:

plot_workaround <- function(dat,xvar,yvar,colorvar=NULL,shapevar=NULL) {
  aes_args <- list(x=xvar,y=yvar,color=colorvar,shape=shapevar)
  aes_args <- lapply(aes_args, function(x) { if(!is.null(x)) rlang::data_sym(x) } ) 
  myplot <- ggplot(dat,aes(!!!aes_args)) + geom_point()
  return(myplot)
}

# Scenario 1: Passing NULL for an unused aesthetic
plot_workaround(plotdat,'weight','height',colorvar='employment',shapevar=NULL) # This works

# Scenario 2: The variable used for shape isn't determined until the code runs
whichvar <- sample(c('sex','employment'),1)
plot_workaround(plotdat,'weight','height',colorvar='age',shapevar=whichvar) # This works

# Scenario 3: Both things at once
whichvar <- NULL
plot_workaround(plotdat,'weight','height',colorvar='age',shapevar=whichvar) # This works

Hopefully there's a shorter/cleaner "best practices" way to achieve this that can be added to the documentation/vignette, but in the meantime I'm sharing this work-around in case it's of use to anyone else.

teunbrand commented 2 days ago

Thanks for the report! I'm not sure if aes(var = NULL) is intended usage to turn off an aesthetic and if we advertise/document that somewhere. If it is, there should be a tidy eval path to set this programatically. If it isn't, we can mark this as off-label use and leave things as they are, or decide that it should be inteded usage.

Currently, I don't see a clear path to do this the tidy eval way, as the .data pronoun doesn't allow subsetting with NULL. Note that you can use plot_embrace(plotdat, weight, height, colorvar = age, shapevar = .data[[whichvar]]).

I'm personally in favour of recognising or making this intended usage. I use it all the time to escape a particular global aesthetic in single layers.

kkellysci commented 2 days ago

You're right!!

And my work-around only covers the more limited situation "sometimes this function makes ggplots with this aesthetic and sometimes without" (a way to do that programmatically seems like an essential feature).

But in a situation like you describe (something is set at as a global aesthetic, but then you want to disable it for one layer), it would take a longer/more complex workaround than mine, since mine just silently drops/omits NULLs.

Since setting an aesthetic to NULL is useful for both situations, I hope it's recognized as (or made) an intended usage and gets a supported tidy eval way to do it.

smouksassi commented 2 days ago

one concept that I always need to explain to newcomers to ggplot2 is the "precedence" between what is inside aes and what is outside (let alone how to program each situation in a function and or in shiny)

library(ggplot2) mtcars$cylf <- as.factor(mtcars$cyl)

ggplot(mtcars,aes(cylf,mpg,col=cylf))+ geom_point()

ggplot(mtcars,aes(cylf,mpg,col=cylf))+ geom_point()+ geom_point(aes(x=as.numeric(cylf)+0.1),col="red")

ggplot(mtcars,aes(cylf,mpg,col=cylf))+ geom_point(col="green")

ggplot(mtcars,aes(cylf,mpg,col=cylf))+ geom_point(col=NULL)

ggplot(mtcars,aes(cylf,mpg,col=cylf))+ geom_point(aes(col=NULL),col="red")

ggplot(mtcars,aes(cylf,mpg,col=cylf))+ geom_point(col="red")

errors

ggplot(mtcars,aes(cylf,mpg,col=cylf))+ geom_point(col=NULL)

https://stackoverflow.com/questions/74414272/how-to-replace-the-deprecated-ggplot2-function-aes-string-accepting-an-arbitrar

there is also a use case when you want to transform your variable inline example: aes(x = log(varname))

kkellysci commented 2 days ago

Interesting point about the precedence, but it still doesn’t cover the basic scenario of “sometimes this function makes ggplots with this aesthetic and sometimes without”.

When a user passes a function variable names to map to ggplot aesthetics, that should include being able to pass something that says not to map that aesthetic.

Even a basic scatter plot with three “optional” aesthetics (fill, color, shape) would need 2^3 = 8 if statements to handle all possible combinations of NULL/non-NULL values if the NULLs couldn’t be passed on to aes(), eg.

if(!is.null(colorvar) & is.null(fillvar) & is.null(shapevar)) {
    myplot <- ggplot(dat,aes(x=.data[[xvar]],y=.data[[yvar]],color=.data[[colorvar]])) + geom_point()
} else if(!is.null(colorvar) & !is.null(fillvar) & is.null(shapevar)) {
    myplot <- ggplot(dat,aes(x=.data[[xvar]],y=.data[[yvar]],color=.data[[colorvar]],fill=.data[[fillvar]])) + geom_point()
} else if(!is.null(colorvar) & !is.null(fillvar) & !is.null(shapevar)) {
    myplot <- ggplot(dat,aes(x=.data[[xvar]],y=.data[[yvar]],color=.data[[colorvar]],fill=.data[[fillvar]],shape=.data[[shapevar]])) + geom_point()
} else if(is.null(colorvar) & !is.null(fillvar) & !is.null(shapevar)) {
    myplot <- ggplot(dat,aes(x=.data[[xvar]],y=.data[[yvar]],fill=.data[[fillvar]],shape=.data[[shapevar]])) + geom_point()
}
# And so on for the rest of the combinations

aes() itself has no trouble with an aesthetic being set to NULL, the problem is just that there’s no documented tidy eval way to pass a NULL contained in a variable to aes(). Which was an important thing aes_string() could handle.

smouksassi commented 2 days ago

you also have the size aes that apply to scatter plots and if we add lines then you add group, linewidth and linetype...

kkellysci commented 2 days ago

I was also able to find an example of aesthetics being mapped to NULL in the ggplot2 documentation, which suggests this is an intended usage:

yrng <- range(economics$unemploy)
p <- p +
   geom_rect(
       aes(NULL, NULL, xmin = start, xmax = end, fill = party),
       ymin = yrng[1], ymax = yrng[2], data = presidential
   )

So hopefully if setting aesthetics to NULL in aes() is an intended usage, there can be a documented tidy eval way to take a NULL contained in a variable and pass it to aes().

teunbrand commented 2 days ago

I was also able to find an example of aesthetics being mapped to NULL

Thanks for finding this, that settles the 'is it intended' question then as 'yes'.

there can be a documented tidy eval way to take a NULL contained in a variable and pass it to aes()

We'd first need to find a way before we can document it.

ggplot(mtcars,aes(cylf,mpg,col=cylf))+ geom_point(col=NULL)

I'm not sure what exactly the intent is here? In dev ggplot2 this no longer errors, but throws a warning: "Ignoring empty aesthetic: colour.".

smouksassi commented 2 days ago

apologies yes I am using latest released ( I also remembered when I had to use shquote https://stackoverflow.com/questions/28777626/how-do-i-combine-aes-and-aes-string-options )

my intent is that we cover all potential programming needs showing how to move away from ae_string once and for all

teunbrand commented 2 days ago

my intent is that we cover all potential programming needs showing how to move away from ae_string once and for all

Yeah that tracks, but I what is the intent of geom_point(col = NULL) specifically?

smouksassi commented 2 days ago

I remember needing this when I was programming a shiny app and then user had several ways to select options in the UI: color by default is NULL and is a global aes. user might want to deactivate the global aesthetic for specific layer i.e force a color (ignore global mapped color) user forgot to select a non null for the specified color: geom_point(col = NULL) I agree that outside of a UI where we are trying to capture every user input possibility this might not be very relevant

teunbrand commented 2 days ago

Thanks that makes sense. I suppose with the dev version, you could just use suppressWarnings() to omit this empty aesthetic warning.

However, that is a different issue from programatically passing NULL as an aesthetic in aes().