tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.75k stars 2.12k forks source link

`if_any()` does not work as expected inside `mutate` when no inputs are provided #7077

Open ks8997 opened 1 month ago

ks8997 commented 1 month ago

This is related to #7074. if_any() is now supposed to return FALSE when no inputs are provided as per #7072. This works correctly inside filter but not mutate.

iris %>% 
  filter(if_any(c(), \(x) x  > 4000)) # returns no rows as expected

iris %>%
    mutate(z = if_any(c(),  \(x) x  > 4000)) # incorrectly returns all TRUE

Apologies if I misunderstood something.

DavisVaughan commented 1 month ago

Yea I think you're right, I didn't double check for mutate(). Thanks for following up!

ks8997 commented 1 month ago

@DavisVaughan So I poked around a bit and I think the fix seems straightforward. The problematic section is the below chunk in if_across():

https://github.com/tidyverse/dplyr/blob/0005f6768fa765b3bba5148711967d58b6013037/R/across.R#L345-L347

This unconditionally returns TRUE for empty inputs.

My proposed fix is:

if_across <- function(op, df) {
  n <- nrow(df)
  across_if_fn <- context_peek_bare("across_if_fn")

  if (!length(df)) {
    if (across_if_fn == "if_any") {
      return(FALSE)
    } else {
      return(TRUE)
    }
  }

  combine <- function(x, y) {
    if (is_null(x)) {
      y
    } else {
      op(x, y)
    }
  }
  reduce(df, combine, .init = NULL)
}

The across_if_fn stores information about the type of across function we are calling. It will then returnFALSE for if_any() and TRUE for if_all(). The rest of the function is unchanged. New to this but happy to submit a PR if this solution looks alright to you. I'll also add a test case.