njtierney / geotargets

Targets extensions for geospatial data
https://njtierney.github.io/geotargets/
Other
49 stars 4 forks source link

Apparent bugs / problems with function body modification with covr+testthat+tar_test() #38

Closed brownag closed 3 months ago

brownag commented 3 months ago

Replacement of a function body does not seem to work properly within the evaluation environment created by combination of {covr}, targets tests with tar_test(), and {testthat}. We dynamically modify our internal functions using body<- to customize the functions passed to tar_format().

Interactive runs, local runs, runs of {testthat} locally and remotely will all work fine, but when running tests via {covr} the problem arises.

I did some digging the other day into this issue, and there is a great vignette on how {covr} works behind the scenes: https://cran.r-project.org/web/packages/covr/vignettes/how_it_works.html

From the vignette:

The core function in covr is trace_calls(). This function was adapted from ideas in Advanced R - Walking the Abstract Syntax Tree with recursive functions. This recursive function modifies each of the leaves (atomic or name objects) of an R expression by applying a given function to them. If the expression is not a leaf the walker function calls itself recursively on elements of the expression instead.

My suspicion is whatever {covr} does to modify the code during its processing causes a situation where no modification to the tar_format() read/write function occurs when we call body<-.

To solve this, I think we need to isolate the bug in a reprex and report it to {covr} maintainers. Probably this is not intended behavior, since other evaluation contexts have no apparent issues. In the meantime we could possibly find a way to "protect" or exclude the specific functions from being modified by {covr}.

I have tried several different alternatives to current approach--including converting function body to string and modifying the string, replacing the whole function; using an intermediate object; storing the function in a different environment and modifying it there--and no method of replacement with body<- seems to work.

These tests indicate to me it is not something simple like that the {covr} modified function body has more than 2 elements, or that the names of the arguments are missing, or that it is evaluating in the wrong environment. It seems to mealmost like there is a copy of the function made, and it corresponds to the initial function definition, and the body calls do not get applied to the right version of the function, which is what is ultimately passed to tar_format()

@Aariq had some good thoughts on covr related issues worth following up on:

Would it be helpful to maybe look at how targets, tarchetypes, etc. deals with this? There might be some hints in their github actions or in commit history. I'm assuming the coverage issues have something to do with callr R sessions in tests, but I don't actually know. Not sure if this is helpful, but I've noticed that the tests use the installed version of geotargets because of the namepacing with geotargets:: rather than using the current state of the code.

Originally posted by @Aariq in https://github.com/njtierney/geotargets/issues/31#issuecomment-2000067999

brownag commented 3 months ago

I am going to add some reprexes here; starting with this suggestion from @njtierney on #39

The following works interactively, in a reprex, or similar, but will not work if example_fun_fun()$write is passed to tar_format()


I wonder if we even actually need to modify the function body directly? I'm fairly sure those parts will be evaluated appropriately?

E.g.,

example_funfun <- function(sep_type){
  .write_fun <- function(object, file){
    # loudly declare the quote type when writing
    readr::write_lines(x = object,
                       file = file,
                       sep = sep_type)
    cat("Hi! The separation marker is: '", sep_type,"'", sep = "")
    cat("\n")
    cat("We wrote the file out to:\n", file, sep = "")
  }

  return(
    list(
      read = readr::read_csv,
      write = .write_fun
    )
  )

}

my_example <- example_funfun(",")

# sep_type isn't evaluated yet
my_example$write
#> function(object, file){
#>     # loudly declare the quote type when writing
#>     readr::write_lines(x = object,
#>                        file = file,
#>                        sep = sep_type)
#>     cat("Hi! The separation marker is: '", sep_type,"'", sep = "")
#>     cat("\n")
#>     cat("We wrote the file out to:\n", file, sep = "")
#>   }
#> <environment: 0x12a2669e0>

# but sep_type gets evaluated later when the function is used:
my_example$write(
  object = "1,2,3,4", 
  file = tempfile(fileext = ".csv")
  )
#> Hi! The separation marker is: ','
#> We wrote the file out to:
#> /var/folders/9c/k3wqmhhx4qsb3fd66n4prhlw0000gq/T//RtmpUdvv1a/file104a0120aafb7.csv

Created on 2024-03-16 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.3 (2024-02-29) #> os macOS Sonoma 14.3.1 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Hobart #> date 2024-03-16 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.0) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1) #> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0) #> digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.1) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.1) #> hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R.cache 0.16.0 2022-07-21 [2] CRAN (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [2] CRAN (R 4.3.0) #> R.oo 1.26.0 2024-01-24 [2] CRAN (R 4.3.1) #> R.utils 2.12.3 2023-11-18 [2] CRAN (R 4.3.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> readr 2.1.5 2024-01-10 [1] CRAN (R 4.3.1) #> reprex 2.1.0 2024-01-11 [2] CRAN (R 4.3.1) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.1) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.3.0) #> styler 1.10.2 2023-08-29 [2] CRAN (R 4.3.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1) #> vroom 1.6.5 2023-12-05 [1] CRAN (R 4.3.1) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.1) #> xfun 0.42 2024-02-08 [1] CRAN (R 4.3.1) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1) #> #> [1] /Users/nick/Library/R/arm64/4.3/library #> [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

Or is there some deeper stuff happening with targets here?

_Originally posted by @njtierney in https://github.com/njtierney/geotargets/pull/39#discussion_r1527125475_


Compare this to running the same process via {targets}

library(targets)

tar_script({
    tar_custom_writecsv <- function(name,
                                    command,
                                    pattern = NULL,
                                    sep_type = NULL,
                                    tidy_eval = targets::tar_option_get("tidy_eval"),
                                    packages = targets::tar_option_get("packages"),
                                    library = targets::tar_option_get("library"),
                                    repository = targets::tar_option_get("repository"),
                                    iteration = targets::tar_option_get("iteration"),
                                    error = targets::tar_option_get("error"),
                                    memory = targets::tar_option_get("memory"),
                                    garbage_collection = targets::tar_option_get("garbage_collection"),
                                    deployment = targets::tar_option_get("deployment"),
                                    priority = targets::tar_option_get("priority"),
                                    resources = targets::tar_option_get("resources"),
                                    storage = targets::tar_option_get("storage"),
                                    retrieval = targets::tar_option_get("retrieval"),
                                    cue = targets::tar_option_get("cue")) {

        name <- targets::tar_deparse_language(substitute(name))

        envir <- targets::tar_option_get("envir")

        command <- targets::tar_tidy_eval(
            expr = as.expression(substitute(command)),
            envir = envir,
            tidy_eval = tidy_eval
        )

        pattern <- targets::tar_tidy_eval(
            expr = as.expression(substitute(pattern)),
            envir = envir,
            tidy_eval = tidy_eval
        )

        example_funfun <- function(sep_type) {
            .write_fun <- function(object, path) {
                # loudly declare the quote type when writing
                readr::write_lines(x = object,
                                   file = file,
                                   sep = sep_type)
                cat("Hi! The separation marker is: '", sep_type,"'", sep = "")
                cat("\n")
                cat("We wrote the file out to:\n", file, sep = "")
            }

            return(
                list(
                    read = function(path) readr::read_csv(path),
                    write = .write_fun
                )
            )

        }

        fmt <- targets::tar_format(
            read = example_funfun(sep_type)$read,
            write = example_funfun(sep_type)$write
        )

        targets::tar_target_raw(
            name = name,
            command = command,
            pattern = pattern,
            packages = packages,
            library = library,
            format = fmt,
            repository = repository,
            iteration = iteration,
            error = error,
            memory = memory,
            garbage_collection = garbage_collection,
            deployment = deployment,
            priority = priority,
            resources = resources,
            storage = storage,
            retrieval = retrieval,
            cue = cue
        )
    }

    list(tar_custom_writecsv(test_writecsv, cars, sep_type = "|"))
})

tar_make()
#> ▶ dispatched target test_writecsv
#> ✖ errored target test_writecsv
#> ✖ errored pipeline [0.346 seconds]
#> Error:
#> ! Error running targets::tar_make()
#> Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
#> Debugging guide: https://books.ropensci.org/targets/debugging.html
#> How to ask for help: https://books.ropensci.org/targets/help.html
#> Last error message:
#>     _store_ object 'sep_type' not found
#> Last error traceback:
#>     No traceback available.
#> Backtrace:
#>     ▆
#>  1. └─targets::tar_make()
#>  2.   └─targets:::callr_outer(...)
#>  3.     ├─targets:::if_any(...)
#>  4.     └─targets:::callr_error(traced_condition = out, fun = fun)
#>  5.       └─targets::tar_throw_run(message, class = class(traced_condition$condition))
#>  6.         └─targets::tar_error(...)
#>  7.           └─rlang::abort(message = message, class = class, call = tar_empty_envir)
x <- tar_read(test_writecsv)
#> Error: '_targets/objects/test_writecsv' does not exist in current working directory ('/tmp/RtmpxNHiyN/reprex-691145ccf529-stuck-crane').
x
#> Error in eval(expr, envir, enclos): object 'x' not found
brownag commented 3 months ago

I don't think that will work, or at least I tried it before arriving at the body<- solution I proposed in #11 here. Tried doing it again now because I hoped I had missed something.

What you propose works fine interactively in your global environment, or in a reprex, etc. but the context where those tar_format() functions get evaluated you will not have the e.g. sep_type object defined.

Essentially the functions tar_format() uses need to be self-contained, not relying on the prior parent frame. You actually want sep_type to be evaluated sooner as it is when we modify the function body.

If you implement your suggestion for tar_terra_rast() you will get something like this:

Loading required namespace: terra
▶ dispatched target test_terra_rast
✖ errored target test_terra_rast
✖ errored pipeline [0.13 seconds]
Error:
! Error running targets::tar_make()
Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
Debugging guide: https://books.ropensci.org/targets/debugging.html
How to ask for help: https://books.ropensci.org/targets/help.html
Last error message:
    _store_ object 'filetype' not found
Last error traceback:
    No traceback available.
```r # tar-terra-rast.R: create_format_terra_raster #... .myfun <- function(filetype, gdal) { .write_terra_raster <- function(object, path) { terra::writeRaster( object, path, filetype = filetype, overwrite = TRUE, gdal = gdal, ) } return(list(write = .write_terra_raster)) } targets::tar_format( read = function(path) terra::rast(path), write = .myfun(filetype, gdal)$write, marshal = function(object) terra::wrap(object), unmarshal = function(object) terra::unwrap(object) ) # ... ```

Some other ideas I tried were using purrr::partial() to compose functions with some arguments pre-specified and couldn't get that working. Also forcing evaluation and injecting values with {rlang} metaprogramming operators also did not appear to work, but that might be worth re-investigating now.

_Originally posted by @brownag in https://github.com/njtierney/geotargets/pull/39#discussion_r1527184740_

njtierney commented 3 months ago

I'm curious if you've run into this issue before @wlandau?

wlandau commented 3 months ago

In the functions supplied to tar_format(), it looks like sep_type is a global variable stored in the closure of the function. These functions are deparsed so they can be stored accurately and compactly in the metadata, so the closure is lost. I would substitute() the value of sep_type into the function body instead.