tidyverse / duckplyr

A drop-in replacement for dplyr, powered by DuckDB for performance.
https://duckplyr.tidyverse.org/
Other
253 stars 15 forks source link

find calls for `box` package use #221

Open stefanlinner opened 1 month ago

stefanlinner commented 1 month ago

I noticed that the function translation doesn't work (yet) in combination with the box package:

box::use(
  duckplyr
)
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, no data will be collected or uploaded.
#> → Run `duckplyr::fallback_sitrep()` to review the current settings.

options(duckdb.materialize_message = FALSE)

# Triggers fallback
mtcars_rows <- 
  mtcars |> 
  duckplyr$as_duckplyr_tibble() |> 
  duckplyr$mutate(row_num = duckplyr$row_number())
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, no data will be collected or uploaded.
#> ℹ A fallback situation just occurred. The following information would have been
#>   recorded:
#>   {"version":"0.4.1","message":"Can't translate function
#>   `duckplyr$row_number`.","name":"mutate","x":{"...1":"numeric","...2":"numeric","...3":"numeric","...4":"numeric","...5":"numeric","...6":"numeric","...7":"numeric","...8":"numeric","...9":"numeric","...10":"numeric","...11":"numeric"},"args":{"dots":{"...12":"...13$...14()"},".by":"NULL",".keep":["all","used","unused","none"]}}
#> → Run `duckplyr::fallback_sitrep()` to review the current settings.
#> → Run `Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = 1)` to enable fallback logging,
#>   and `Sys.setenv(DUCKPLYR_FALLBACK_VERBOSE = TRUE)` in addition to enable
#>   printing of fallback situations to the console.
#> → Run `duckplyr::fallback_review()` to review the available reports, and
#>   `duckplyr::fallback_upload()` to upload them.
#> ℹ See `?duckplyr::fallback()` for details.
#> ℹ This message will be displayed once every eight hours.
#> Error processing with relational.
#> Caused by error in `rel_find_call()`:
#> ! Can't translate function `duckplyr$row_number`.

# Works as intended
mtcars_rows <- 
  mtcars |> 
  duckplyr$as_duckplyr_tibble() |> 
  duckplyr$mutate(row_num = duckplyr::row_number())

Created on 2024-08-07 with reprex v2.1.1

Session info ``` r sessionInfo() #> R version 4.2.3 (2023-03-15 ucrt) #> Platform: x86_64-w64-mingw32/x64 (64-bit) #> Running under: Windows 10 x64 (build 22631) #> #> Matrix products: default #> #> locale: #> [1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8 #> [3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C #> [5] LC_TIME=German_Germany.utf8 #> #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> #> loaded via a namespace (and not attached): #> [1] rstudioapi_0.16.0 knitr_1.48 magrittr_2.0.3 tidyselect_1.2.1 #> [5] R6_2.5.1 rlang_1.1.4 fastmap_1.2.0 fansi_1.0.6 #> [9] dplyr_1.1.4 tools_4.2.3 xfun_0.46 utf8_1.2.4 #> [13] DBI_1.2.3 cli_3.6.3 withr_3.0.0 htmltools_0.5.8.1 #> [17] yaml_2.3.10 digest_0.6.36 tibble_3.2.1 lifecycle_1.0.4 #> [21] box_1.2.0 duckdb_1.0.0 vctrs_0.6.5 fs_1.6.4 #> [25] glue_1.7.0 evaluate_0.24.0 rmarkdown_2.27 reprex_2.1.1 #> [29] compiler_4.2.3 duckplyr_0.4.1 pillar_1.9.0 generics_0.1.3 #> [33] collections_0.3.7 jsonlite_1.8.8 pkgconfig_2.0.3 ```

I suggest adding the $ operator to the "fully qualified" package + name case. I also added an extra check that length(name) == 3 to avoid some (unexpected) nested special cases like list$packagename$functionname().

krlmlr commented 1 month ago

Thanks. While this change works, it also opens up too many possibilities for my taste.

Can we, for the case of $, test that pkg$fun actually evaluates to pkg::fun ?

To be fair, the current code only works if :: really is from the base package. Can/should we check this too?

stefanlinner commented 4 weeks ago

You are right. After playing around a bit, I found some examples that work, but shouldn't work in my opinion:

mtcars_rows <- 
  mtcars |> 
  duckplyr::as_duckplyr_tibble() |> 
  # There is no `test` package
  duckplyr::mutate(row_num = test::row_number())
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, no data will be collected or uploaded.
#> → Run `duckplyr::fallback_sitrep()` to review the current settings.

mtcars_rows <- 
  mtcars |> 
  duckplyr::as_duckplyr_tibble() |> 
  # There is no `test` package and `box` is not used
  duckplyr::mutate(row_num = test$row_number())

Created on 2024-08-19 with reprex v2.1.0

However, what should be possible, as this is a common use case with box:

box::use(
  test = duckplyr
)
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, no data will be collected or uploaded.
#> → Run `duckplyr::fallback_sitrep()` to review the current settings.

mtcars_rows <- 
  mtcars |> 
  test$as_duckplyr_tibble() |> 
  test$mutate(row_num = test$row_number())

Created on 2024-08-19 with reprex v2.1.0

So I think we should check for the following things:

  1. :: is actually from base
  2. pkg namespace / object exists
  3. fun is part of the package namespace / object
  4. make sure that fun is actually used from the provided namespace / object

What do you think?

krlmlr commented 4 weeks ago

Thanks. Let's wait a bit, I have a code locally that handles the test:: problem.

For the box::use(test = duckplyr) case, how do you intend to check that test actually maps to duckplyr? Can we make this conditional on isNamespaceLoaded("box") ?

stefanlinner commented 4 weeks ago

I was thinking of something like this at the top of rel_find_call:

name <- as.character(fun)

  if (length(name) == 3) {

    pkg <- name[[2]]
    fun_name <- name[[3]]

    if (name[[1]] == "::") {

      # ... :: case ...

    } else if (name[[1]] == "$") {

      # Check that box is used
      if(!isNamespaceLoaded("box")){
        cli::cli_abort("Meaningful error")
      }

      # Check that pkg object is available in GlobalEnvironment
      if(!exists(pkg, envir = env)) {
        cli::cli_abort("Meaningful error")
      }

      fun_box <- eval(fun)
      # Which namespace is the box object actually mapped to
      pkg_ns <- getNamespaceName(environment(fun_box))[["name"]]
      fun_mapped <- get(fun_name, envir = asNamespace(pkg_ns))

      # Check that box function and mapped function are identical
      if(!identical(fun_box, fun_mapped)) {
        cli::cli_abort("Meaningful error")
      }

      return(c(pkg_ns, fun_name))

    }

  }

So in the end the package name to which test is mapped to will be returned and not test itself.