Add `eval_tidy()` version with default

clauswilke commented 4 years ago

In ggplot2, we extensively use the pattern <data> %||% <default value>, as in color <- data$color %||% "red". This works great when we know what column we want to look up in the data frame, but it fails if we want to do this with eval_tidy(), because eval_tidy() fails instead of returning NULL.

In my own code, I've worked around this by defining a function eval_default() which works like eval_tidy() but swaps in a default value if eval_tidy() fails: https://github.com/wilkelab/cowplot/blob/06aeeb449ccc0fb343eb03c83111df7b171e73af/R/key_glyph.R#L145-L154

library(rlang)

eval_default <- function(x, data, default) {
  force(default)

  suppressWarnings(
    tryCatch(
      error = function(e) default,
      eval_tidy(x, data)
    )
  )
}

data <- data.frame(x = 1)
eval_default(quo(x), data, NULL)
#> [1] 1
eval_default(quo(y), data, NULL)
#> NULL

^{Created on 2020-01-12 by the reprex package (v0.3.0)}

Based on the conversation here, it seems I'm not the only one running into this issue, so maybe adding something like this function to rlang is justified.

One comment regarding my proposed implementation: I had to add suppressWarnings() to prevent a warning "restarting interrupted promise evaluation" that would sometimes pop up. I find this warning sufficiently disconcerting that I'm not sure I can just ignore it, but I don't know enough about the internals of R evaluation to understand what the warning means and whether it can be avoided instead of suppressed.

lionel- commented 4 years ago

I think this would encourage brittle and hacky programming patterns.

yutannihilation commented 4 years ago

But, I think it's a nature of R's lexical scoping that we cannot know whether an evaluation will success or not without trying actually evaluating it. In fact, we already see the pattern at least in ggplot2, cowplot and gghighlight. I believe this is worth adding in rlang.

clauswilke commented 4 years ago

I think my customizable key-glyph function is a good test case because it's rather short and simple: https://github.com/wilkelab/cowplot/blob/master/R/key_glyph.R

Is it possible to write a similar function without using eval_default()? Or would you argue the function has some fundamental problems and shouldn't exist as written?

lionel- commented 4 years ago

I think all these aes should take their default values in their signature.

Then if the user supplies a wrong expression, they will correctly get an error rather than rectangle_key_glyph() hiding it and giving a default.

yutannihilation commented 4 years ago

It's not that the supplied expression is wrong and we want to hide the errors. We just want to skip the evaluation where the expression cannot be evaluated.

For example, to explain https://github.com/tidyverse/ggplot2/issues/2963 in short, we want to skip splitting the layer whose data doesn't contain the variable to determine PANEL, but this cannot be done by investigating the expression. In this case, the expression is valid, but not on all the layers.

I think this is not a rare scenario because one of the purposes of capturing the expression is to evaluate it in multiple contexts.

lionel- commented 4 years ago

Thanks for the context, I see how this is useful. FWIW, I think the correct approach is to explicitly wrap eval_tidy() in tryCatch(). Ignoring errors in this way is really a stopgap rather than a principled idiom because this swallows programming errors made by the user. Making the tryCatch(error = function(...) default) explicit should have the effect of raising eyebrows when reading/reviewing code, which seems appropriate here.

lionel- commented 4 years ago

On the ggplot2 side, would it be possible to bind the same symbols inside each panel, and when the symbols are undefined use an active binding to throw a typed error? And then you'd only catch these particular errors in the tryCatch(). It sounds like this would produce more principled evaluation semantics.

yutannihilation commented 4 years ago

Thanks, sounds reasonable.

would it be possible to bind the same symbols inside each panel, and when the symbols are undefined use an active binding to throw a typed error?

Cool. I didn't come up with the idea... The remaining questions is whether we should support referring to the variable without symbol (e.g. get("x")). Anyway, I'm gradually feeling that each package should define their own wrapper of eval_tidy() & tryCatch() as you suggested.

lionel- commented 4 years ago

whether we should support referring to the variable without symbol (e.g. get("x"))

With an "undefined" active binding, this would have the same consequence as normal evaluation:

env_bind_active(current_env(), var = function() cat("hello\n"))

var
#> hello

get("var")
#> hello
#> NULL

clauswilke commented 4 years ago

I think all these aes should take their default values in their signature.

Yeah, this is also not possible in my case either. Legend glyphs should be drawn correctly for as many geoms as possible, and it's generally not known (in particular to the user!) which geoms provide which aesthetics to the legend drawing code. So the only meaningful action to take is to fill in default values if and only if the required aesthetics are missing.

In any case, I'm not so worried about the tryCatch() and more about the warning "restarting interrupted promise evaluation". Is there a way to avoid this other than suppressWarnings()? I don't understand the comment about active binding, so I'm not sure whether this applies in my case or not. I'll say, though, that I have no control over what data is fed into the key drawing code, I just have to accept whatever I'm given and make the best of it.

clauswilke commented 4 years ago

After pondering this some more, let me try to make a case for a slightly different behavior: A variant of eval_tidy() where missing objects are substituted with NULL instead of throwing an error. The reprex below explains why. In many cases, we're using eval_tidy() essentially just to look up columns in a data frame, but we don't get equivalent behavior if a missing column is encountered. This means that even if I provide the correct default value in the normal expression we would use (e.g., y %||% 0), the evaluation fails.


library(rlang)

get_aesthetics_value <- function(aes = x, data) {
  aes <- enquo(aes)

  eval_tidy(aes, data)
}

data <- data.frame(x = 1)

# try to access `y` directly from data frame
data$y # without default
#> NULL
data$y %||% 0 # with default
#> [1] 0

# try to access `y` via function
get_aesthetics_value(y, data) # without default
#> Error in eval_tidy(aes, data): object 'y' not found
get_aesthetics_value(y %||% 0, data) # with default, breaks
#> Error in is_null(x): object 'y' not found

^{Created on 2020-01-13 by the reprex package (v0.3.0)}

lionel- commented 4 years ago

So the only meaningful action to take is to fill in default values if and only if the required aesthetics are missing.

I'm not sure I understand the context fully, but maybe it would help if ggplot2 provided descriptor functions for the current value of the aesthetic. Then you could do something like fill = default_aes("fill", "grey20") in the signature. This would make it less magical and obscure.

Another possibly useful way would be to wrap the default value in a sentinel wrapper, so that ggplot2 knows this value should only be set if missing.

yutannihilation commented 4 years ago

@clauswilke In your case, if you can know all possible variable names in advance, I think data masks can be created with default values.

library(rlang)

eval_with_defaults <- function(q, data) {
  # default values
  mask <- list(x = 0, y = 0)
  # overwrite values if exists
  mask[names(data)] <- data

  eval_tidy(q, mask)
}

q <- quo(x + y %||% 10)

eval_with_defaults(q, data.frame(x = 1, y = 1))
#> [1] 2
eval_with_defaults(q, data.frame(x = 1))
#> [1] 1

^{Created on 2020-01-15 by the reprex package (v0.3.0)}

But, it's not the case with facets, IIUC. I thought we need to extract symbols to what variables we need to set active bindings. Sorry, I don't understand your suggestion, @lionel-...

With an "undefined" active binding, this would have the same consequence as normal evaluation:

What is "undefined" active binding? I expected something like this:

eval_with_typed_error <- function(quo, data) {
  possible_variables <- extract_symbols_as_strings(quo)
  local_bindings(!!possible_variables := function(e) raiseNotFoundError())
  eval_tidy(quo, data)
}

yutannihilation commented 4 years ago

restarting interrupted promise evaluation

This comes from here

https://github.com/wch/r-source/blob/3cb780c3f766316a8f75bd6d890720d630ad3e37/src/main/eval.c#L505

This probably indicates some default argument is referenced multiple times? I'm not sure if it's OK or not, though.

clauswilke commented 4 years ago

The warning arises when you try to evaluate an argument, it errors out, and then you evaluate it again. See this SO question: https://stackoverflow.com/questions/20596902/r-avoiding-restarting-interrupted-promise-evaluation-warning

There's also an answer that shows how to avoid the warning, but I don't understand why it works (and it's not explained).

yutannihilation commented 4 years ago

Thanks. If I understand correctly, our case is different from this SO. As long as we pass a quosure instead of raw expression, evaluating a promise doesn't fail (it just returns a quosure), while the content (?) of the promise (a quosure) might be failed to evaluated. So, I have no idea why you see that warning...

library(rlang)

foo <- function() stop("Foo error")

bar2 <- function(x) {
  try(eval_tidy(x))
  x
}

bar2(quo(foo()))
#> Error in foo() : Foo error
#> <quosure>
#> expr: ^foo()
#> env:  global

warnings()

^{Created on 2020-01-15 by the reprex package (v0.3.0)}

The modern rewrite of the SO answer would be something like below (c.f. eval.parent(substitute(x)) is equivalent to x_expr <- substitute(x); eval(x_expr, parent.frame()))

bar3 <- function(x) {
  x_quo <- enquo(x)
  try(eval_tidy(x_quo))
  x
}

clauswilke commented 4 years ago

So, I have no idea why you see that warning

Actually, me neither. The problem is that in all my simple test code, I cannot generate the warning. But when I use the glyph drawing code inside ggplot2, in some instances the warning gets triggered.

yutannihilation commented 4 years ago

Hmm, curious...

lionel- commented 4 years ago

What is "undefined" active binding?

It's an active binding that throws an "undefined" typed error.

With your local_bindings() example, you're just binding the functions in the normal way. The alternatives are to bind the functions as lazy bindings (promises) or as active bindings. The advantage of the latter is that you can evaluate multiple times with the same mask, and this takes care of corner cases when debugging with browser(), recovering with a calling handler etc. By comparison, a lazy binding will only throw once.

In your case, if you can know all possible variable names in advance, I think data masks can be created with default values.

That sounds like a better approach!

yutannihilation commented 4 years ago

Thanks for the detailed explanation! I see.

yutannihilation commented 4 years ago

I'm playing with active bindings at https://github.com/tidyverse/ggplot2/pull/3735. Not sure if I'll successfully complete this PR at the moment, but now I feel ggplot2 can manage to handle the cases without this version of eval_tidy()...

clauswilke commented 4 years ago

Considering how much new and non-trivial code https://github.com/tidyverse/ggplot2/pull/3735 contains, and that I would have to implement more-or-less the same for my use case, I'm wondering whether it is possible to implement a convenience function that performs this task, i.e., a function that injects active bindings to some symbol names into a quosure.

yutannihilation commented 4 years ago

Agreed. I think I'll need some more utilities for tweaking data mask. But, currently we don't have established ecosystem of typed errors, so it might be a bit early to see what tools are needed for this.

clauswilke commented 4 years ago

I was thinking about something generic, such as quo_set_active_bindings(quo, bindings).

yutannihilation commented 4 years ago

Ah, sorry, I went a bit wrong direction at the first try, but I now think we don't need to tweak a quosure but a data mask. (I was also wrong above at commenting "it's not the case with facets." Now it turned out the solution is very similar.)

lionel- commented 4 years ago

I now think we don't need to tweak a quosure but a data mask

Agreed, in general, having to modify the contents of a quosure (the expression or the env) is a sign of a wrong approach.

clauswilke commented 4 years ago

The talk today by @paleolimbot made me realize the correct pattern for what I want to achieve. I need to use the .data pronoun. Then my code just works, without even needing a tryCatch() call. Not sure if this applies to @yutannihilation's case as well or not.

library(rlang)

get_aesthetics_value <- function(aes = x, data) {
  aes <- enquo(aes)

  eval_tidy(.data[[aes]], data)
}

data <- data.frame(x = 1)

# without default
get_aesthetics_value(y, data)
#> NULL

# with default
get_aesthetics_value(y, data) %||% 0
#> [1] 0

^{Created on 2020-01-30 by the reprex package (v0.3.0)}

clauswilke commented 4 years ago

Sorry for this noise, this doesn't quite work yet, but I think I'm on the right track here regardless.

yutannihilation commented 4 years ago

Hmm, isn't this that you forget to pass a quosure to eval_tidy()...? Your get_aesthetics_value() returns NULL for every input.

library(rlang)

get_aesthetics_value <- function(aes = x, data) {
  aes <- as_string(enexpr(aes))

  eval_tidy(quo(.data[[!!aes]]), data)
}

data <- data.frame(x = 1)

# without default
get_aesthetics_value(y, data)
#> Error: Column `y` not found in `.data`

# with default
get_aesthetics_value(y, data) %||% 0
#> Error: Column `y` not found in `.data`

get_aesthetics_value(x, data)
#> [1] 1

^{Created on 2020-01-31 by the reprex package (v0.3.0)}

yutannihilation commented 4 years ago

In your case, what about using base::eval() with an environment containing .data?

get_aesthetics_value <- function(aes = x, data) {
  aes <- as_string(ensym(aes))

  eval(expr(.data[[!!aes]]), env(.data = data))
}

lionel- commented 4 years ago

@yutannihilation just FYI .data[[ is already an unquote operator, so there's no need to use !! inside it.

env(.data = data)

This is a little unhygienic because this environment inherits from the function environment and then the internal namespace. How about just env_get()? Or is the user able to provide arbitrary expressions? In that case you need the user environment, probably as a quosure.

@clauswilke Is there a reason you're defusing the aes argument? My feeling is that taking aesthetic names as strings would be more consistent with the ggplot2 UI, because I don't think aesthetics are data masking. Or perhaps I'm missing something?

yutannihilation commented 4 years ago

.data[[ is already an unquote operator,

Oh, thanks. I didn't notice this...

My point was that the need for NULL default probably indicates it's not a data mask that we need here because it's a behaviour of a bare list or environment. But, probably I'm failing to see the context... Sorry for the noise.

clauswilke commented 4 years ago

@lionel- I'm coming around to your perspective that silently failing in all cases may not be a good idea. Remember the example from Friday where I couldn't get the legend to show the point color? Well, that was exactly this situation, a simple misspelling that is silently caught and circumvented.

library(ggplot2)
library(cowplot)
library(ggridges)

# breaks mysteriously, problem is misspelled aesthetic
# (point_color is internally converted into point_colour)
ggplot(iris, aes(x = Sepal.Length, y = Species, point_color = Species)) +
  geom_density_ridges(
    jittered_points = TRUE,
    position = "raincloud",
    alpha = 0.7,
    key_glyph = rectangle_key_glyph(fill = point_color, padding = margin(3, 3, 3, 3))
  )
#> Picking joint bandwidth of 0.181


# works
ggplot(iris, aes(x = Sepal.Length, y = Species, point_color = Species)) +
  geom_density_ridges(
    jittered_points = TRUE,
    position = "raincloud",
    alpha = 0.7,
    key_glyph = rectangle_key_glyph(fill = point_colour, padding = margin(3, 3, 3, 3))
  )
#> Picking joint bandwidth of 0.181

^{Created on 2020-02-01 by the reprex package (v0.3.0)}

lionel- commented 4 years ago

Thanks for the discussion!

r-lib / rlang

Add `eval_tidy()` version with default #888