markfairbanks / tidytable

Tidy interface to 'data.table'
https://markfairbanks.github.io/tidytable/
Other
450 stars 32 forks source link

ifelse both outcome are evaluated when mutate #581

Closed jfdesomzee closed 2 years ago

jfdesomzee commented 2 years ago

Hello,

I I have some data set that I populate using the content of some files. With dplyr I used to do this:

iris %>% 
  as_tibble %>% 
  rowwise() 
  dplyr::mutate(Response = ifelse(Species=="not a species",pull(read_file("D:/temp/does_not_exist.fst"),resp),"No file"))

which works image

The rowwise is not mandatory here but normaly the file to be read is different for each row.

Trying to move towards tidytable. I tried:

iris %>% 
  as_tidytable %>% 
  mutate.(.Row=1:.N) %>% 
  mutate.(Response = if_else.(Species=="not a species",pull(read_file("D:/temp/does_not_exist.fst"),resp),"No file"),
         .by=".Row")

But this returns: image

It looks like he's trying to read the file anyway. He executes the first part of the ifelse while it should not.

Any suggestion?

markfairbanks commented 2 years ago

ifelse() is actually a base R function, not a dplyr one.

However I can't seem to reproduce the behavior you described with a simple example. As far as I can tell both true and false conditions are executed even if not needed.

pacman::p_load(dplyr)

false_fn <- function() {
  print("false")
  "false"
}

true_fn <- function() {
  print("true")
  "true"
}

df <- tibble(x = 1)

df %>%
  mutate(check = ifelse(x == 2, true_fn(), false_fn()))
#> [1] "true"
#> [1] "false"
#> # A tibble: 1 × 2
#>       x check
#>   <dbl> <chr>
#> 1     1 false
Darxor commented 2 years ago

To add to that, if you are reading different file for each row, you aren't really using ifelse() for its main purpose - being a vectorized if (test) yes else no.

Behavior of ifelse() you are using isn't intended and definitely isn't universal. Example from @markfairbanks actually works as you describe on my system. But this probably wouldn't be added to if_else.() because of type checks.

Instead, consider refactoring your code to something like this (if you have more complex logic for "Response", its probably wise to create its own function, instead of this lambda):

iris %>%
  as_tidytable() %>%
  mutate.(
    Response = map2_chr.(
      Species == "not a species",
      "D:/temp/does_not_exist.fst",
      function(condition, filename) {
        if (condition) {
          readr::read_file(filename)[["resp"]]
        } else {
          "No file"
        }
      }
    )
  )
markfairbanks commented 2 years ago

I think in this case tidytable isn't meant to work with this. I think @Darxor makes the correct suggestion - this seems better suited for an if {} else {} statement.

It's also difficult for me to fix since apparently it works differently between different systems (or maybe R versions?).

jfdesomzee commented 2 years ago

Those returns true and false

df %>%
  dplyr::mutate(check = if_else(x == 2, true_fn(), false_fn()))

df %>%
  mutate.(check = ifelse(x == 2, true_fn(), false_fn()))

df %>%
  mutate.(check = base::ifelse(x == 2, true_fn(), false_fn()))

df %>%
  mutate.(check = if_else.(x == 2, true_fn(), false_fn()))

While those only return false

if. <- base::ifelse
df %>%
  mutate.(check = if.(x == 2, true_fn(), false_fn()))

df %>%
  dplyr::mutate(check = if.(x == 2, true_fn(), false_fn()))

df %>%
  dplyr::mutate(check = ifelse(x == 2, true_fn(), false_fn()))
markfairbanks commented 2 years ago

@jfdesomzee this now works in the development version of tidytable. You'll just need to call base::ifelse() because tidytable exports tidytable::ifelse(). I think the average user will still want the fast tidytable::ifelse(), but your case will still be covered.

Also - "dotless" versions of functions are now exported so you can use mutate() instead of mutate.() if you want.

# devtools::install_github("markfairbanks/tidytable")
library(tidytable, warn.conflicts = FALSE)
#> As of tidytable v0.9.0 dotless versions of functions are exported.
#> You can now use `arrange()`/`mutate()`/etc. directly.

false_fn <- function() {
  print("false")
  "false"
}

true_fn <- function() {
  print("true")
  "true"
}

df <- tidytable(x = 1)

df %>%
  mutate(check = base::ifelse(x == 2, true_fn(), false_fn()))
#> [1] "false"
#> # A tidytable: 1 × 2
#>       x check
#>   <dbl> <chr>
#> 1     1 false

Hope this helps - if you have any questions let me know.

jfdesomzee commented 2 years ago

Thanks, that's great. I don't need my if. function anymore. And the dotless version is a good news to.