njtierney / geotargets

Targets extensions for geospatial data
https://njtierney.github.io/geotargets/
Other
49 stars 4 forks source link

Implement "filetype" argument #11

Closed Aariq closed 3 months ago

Aariq commented 4 months ago

If tar_* functions are specific to packages and data types, then adding a filetype argument somewhere would be a way for users to override defaults for what kind of file targets are stored as (e.g. GeoTIFF vs netCDF). I could imagine filetype being an argument to tar_terra_rast() or an argument to a function supplied to the format argument of tar_terra_rast()

For example: Option 1

tar_terra_rast <-
  function(name, command, pattern = NULL, filetype = c("GeoTIFF", "netCDF"), ...)

Option 2

tar_terra_rast <-
  function(name, command, pattern = NULL, format = format_terra_rast(filetype = c("GeoTIFF", "netCDF")), ...)

where format_terra_rast() returns the result of a call to tar_format()

The first option is probably preferable, unless there are other customizations that users might need to do to the format

Aariq commented 4 months ago

Related comment from PR #7:

I had a bit of a play and I couldn't quite come up with one, I thought you could pass the argument in and R's lexical scoping would handle it, but not quite.

I think this might be another way to implement this - it is initially more code, but the core of the function I think becomes easier to extend. Thank you to @maelle for showing me https://rlang.r-lib.org/reference/arg_match.html

write_raster_gtiff <- function(object, path) {
    function(object, path) {
        terra::writeRaster(
            x = object,
            filename = path,
            overwrite = TRUE,
            filetype = "GTiff"
        )
    }
}

write_raster_netcdf <- function(object, path) {
    function(object, path) {
        terra::writeRaster(
            x = object,
            filename = path,
            overwrite = TRUE,
            filetype = "netCDF"
        )
    }
}

create_write_fun <- function(filetype = c("GTiff", "netCDF")) {
    rlang::arg_match(filetype)
    switch(filetype,
           "GTiff" = write_raster_netcdf(filetype),
           "netCDF" = write_raster_gtiff(filetype)
    )
}

create_write_fun("GTiff")
#> function(object, path) {
#>         terra::writeRaster(
#>             x = object,
#>             filename = path,
#>             overwrite = TRUE,
#>             filetype = "netCDF"
#>         )
#>     }
#> <environment: 0x14f561368>
create_write_fun("netCDF")
#> function(object, path) {
#>         terra::writeRaster(
#>             x = object,
#>             filename = path,
#>             overwrite = TRUE,
#>             filetype = "GTiff"
#>         )
#>     }
#> <environment: 0x139751520>
create_write_fun("wat")
#> Error in `create_write_fun()`:
#> ! `filetype` must be one of "GTiff" or "netCDF", not "wat".

Created on 2024-03-11 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.3 (2024-02-29) #> os macOS Sonoma 14.3.1 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Hobart #> date 2024-03-11 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1) #> digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.1) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.1) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R.cache 0.16.0 2022-07-21 [2] CRAN (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [2] CRAN (R 4.3.0) #> R.oo 1.26.0 2024-01-24 [2] CRAN (R 4.3.1) #> R.utils 2.12.3 2023-11-18 [2] CRAN (R 4.3.1) #> reprex 2.1.0 2024-01-11 [2] CRAN (R 4.3.1) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.1) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.3.0) #> styler 1.10.2 2023-08-29 [2] CRAN (R 4.3.0) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.1) #> xfun 0.42 2024-02-08 [1] CRAN (R 4.3.1) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1) #> #> [1] /Users/nick/Library/R/arm64/4.3/library #> [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

_Originally posted by @njtierney in https://github.com/njtierney/geotargets/pull/7#discussion_r1519025261_

njtierney commented 4 months ago

We don't actually need to write it out each time!

Here's a demo that shows that filetype is just not evaluated until you use the function:

write_raster_filetype <- function(filetype) {
  function(object, path) {
    cat(filetype)
    # terra::writeRaster(
    #   x = object,
    #   filename = path,
    #   overwrite = TRUE,
    #   filetype = filetype
    # )
  }
}

create_write_fun <- function(filetype) {
  rlang::arg_match0(filetype, c("GTiff", "netCDF"))
  write_raster_filetype(filetype)
}

thingy <- create_write_fun("GTiff")
thingy
#> function(object, path) {
#>     cat(filetype)
#>     # terra::writeRaster(
#>     #   x = object,
#>     #   filename = path,
#>     #   overwrite = TRUE,
#>     #   filetype = filetype
#>     # )
#>   }
#> <environment: 0x1306915c0>
# but then it prints the filetype!
thingy()
#> GTiff

Created on 2024-03-12 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.3 (2024-02-29) #> os macOS Sonoma 14.3.1 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Hobart #> date 2024-03-12 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1) #> digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.1) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.1) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R.cache 0.16.0 2022-07-21 [2] CRAN (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [2] CRAN (R 4.3.0) #> R.oo 1.26.0 2024-01-24 [2] CRAN (R 4.3.1) #> R.utils 2.12.3 2023-11-18 [2] CRAN (R 4.3.1) #> reprex 2.1.0 2024-01-11 [2] CRAN (R 4.3.1) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.1) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.3.0) #> styler 1.10.2 2023-08-29 [2] CRAN (R 4.3.0) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.1) #> xfun 0.42 2024-02-08 [1] CRAN (R 4.3.1) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1) #> #> [1] /Users/nick/Library/R/arm64/4.3/library #> [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
brownag commented 4 months ago

I think we can generalize the above approaches a bit further so we do not have to pre-define functions for all the combinations of filetype, nor do we have to independently define the choices for filetype. I define a function create_format_terra_raster() (could just be format_terra_raster()) to do the work.

As above, when the supplied write function is evaluated by targets, the filetype or whatever object have you that is not an argument of the constructed function result will not be defined.

To work around this, we can create a function of the right form, then modify the body of that function to inject constant values for filetype or other parameters. So while we are at it we can also specify custom GDAL creation options via gdal argument to writeRaster()

Something like:

tar_terra_rast <- function(name,
                           command,
                           pattern = NULL,
                           filetype = NULL,
                           gdal = NULL,
                           ...,
                           tidy_eval = targets::tar_option_get("tidy_eval"),
                           packages = targets::tar_option_get("packages"),
                           library = targets::tar_option_get("library"),
                           repository = targets::tar_option_get("repository"),
                           iteration = targets::tar_option_get("iteration"),
                           error = targets::tar_option_get("error"),
                           memory = targets::tar_option_get("memory"),
                           garbage_collection = targets::tar_option_get("garbage_collection"),
                           deployment = targets::tar_option_get("deployment"),
                           priority = targets::tar_option_get("priority"),
                           resources = targets::tar_option_get("resources"),
                           storage = targets::tar_option_get("storage"),
                           retrieval = targets::tar_option_get("retrieval"),
                           cue = targets::tar_option_get("cue")) {

    name <- targets::tar_deparse_language(substitute(name))

    envir <- targets::tar_option_get("envir")

    command <- targets::tar_tidy_eval(
        expr = as.expression(substitute(command)),
        envir = envir,
        tidy_eval = tidy_eval
    )

    pattern <- targets::tar_tidy_eval(
        expr = as.expression(substitute(pattern)),
        envir = envir,
        tidy_eval = tidy_eval
    )

    # could pull defaults from geotargets package options
    if (is.null(filetype)) {
        filetype <- "GTiff"
    }

    targets::tar_target_raw(
        name = name,
        command = command,
        pattern = pattern,
        packages = packages,
        library = library,
        format = create_format_terra_raster(filetype = filetype, gdal = gdal, ...),
        # ...
    )
}

#' @param filetype File format expressed as GDAL driver names passed to `terra::writeRaster()`
#' @param gdal GDAL driver specific datasource creation options passed to `terra::writeRaster()`
#' @param ... Additional arguments not yet used
#' @noRd
create_format_terra_raster <- function(filetype, gdal, ...) {

    if (!requireNamespace("terra")) {
        stop("package 'terra' is required", call. = FALSE)
    }

    # get list of drivers available for writing depending on what the user's GDAL supports
    drv <- terra::gdal(drivers = TRUE)
    drv <- drv[drv$type == "raster" & grepl("write", drv$can), ]

    filetype <- match.arg(filetype, drv$name)

    if (is.null(filetype)) {
        filetype <- "GTiff"
    }

    .write_terra_raster <- function(object, path) {
        terra::writeRaster(
            object,
            path,
            filetype = NULL,
            overwrite = TRUE,
            gdal = NULL
        )
    }
    body(.write_terra_raster)[[2]][["filetype"]] <- filetype
    body(.write_terra_raster)[[2]][["gdal"]] <- gdal

    targets::tar_format(
        read = function(path) terra::rast(path),
        write = .write_terra_raster,
        marshal = function(object) terra::wrap(object),
        unmarshal = function(object) terra::unwrap(object)
    )
}
Aariq commented 4 months ago

That's cool. Never seen body() before, but that's what I was looking for.

Aariq commented 3 months ago

Implemented for tar_terra_rast() in #15 but still needs implementation in tar_terra_vect()