njtierney / geotargets

Targets extensions for geospatial data
https://njtierney.github.io/geotargets/
Other
49 stars 4 forks source link

Functionality for objects *containing* `terra` objects? #77

Closed geryan closed 1 month ago

geryan commented 1 month ago

In projects that iterate over, e.g. scenarios, countries, etc., I often find it convenient to organise spatial objects into list elements of tables, where some columns may contain variables, and others spatial inputs / results.

However neither geotargets nor targets can handle these objects at present (as far as I can tell).

This functionality would be helpful.

Examples below modified from README store a SpatVector in a list.

library(targets)
library(geotargets)

###### using `tar_terra_vect`

tar_dir({ # tar_dir() runs code from a temporary directory.
  tar_script({
    library(geotargets)
    lux_area <- function(projection = "EPSG:4326") {
      terra::project(
        terra::vect(system.file("ex", "lux.shp",
                                package = "terra"
        )),
        projection
      )
    }
    list(
      tar_terra_vect(
        terra_vect_example,
        lux_area() |> list()
      )
    )
  })
  tar_make()
  x <- tar_read(terra_vect_example)
  x
})
#> ▶ dispatched target terra_vect_example
#> ✖ errored target terra_vect_example
#> ✖ errored pipeline [0.062 seconds]
#> Error:
#> ! Error running targets::tar_make()
#> Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
#> Debugging guide: https://books.ropensci.org/targets/debugging.html
#> How to ask for help: https://books.ropensci.org/targets/help.html
#> Last error message:
#>     _store_ unable to find an inherited method for function ‘writeVector’ for signature ‘"list", "character"’
#> Last error traceback:
#>     No traceback available.

###### using `tar_target`

tar_dir({ # tar_dir() runs code from a temporary directory.
  tar_script({
    library(geotargets)
    lux_area <- function(projection = "EPSG:4326") {
      terra::project(
        terra::vect(system.file("ex", "lux.shp",
                                package = "terra"
        )),
        projection
      )
    }
    list(
      tar_target(
        terra_vect_example,
        lux_area() |> list()
      )
    )
  })
  tar_make()
  x <- tar_read(terra_vect_example)
  x
})
#> ▶ dispatched target terra_vect_example
#> ● completed target terra_vect_example [1.242 seconds]
#> ▶ ended pipeline [1.856 seconds]
#> [[1]]
#> Error: external pointer is not valid

Created on 2024-05-24 with reprex v2.1.0

geryan commented 1 month ago
njtierney commented 1 month ago

Thanks for posting the issue @geryan !

I know the problem you mean, sometimes it is handy to put things in tables or lists to keep them together.

I think the crux of this problem comes down to the fact that we would need to be able to create read and write methods for these objects.

            read = function(path) terra::rast(path),
            write = function(object, path) {
                terra::writeRaster(
                    object,
                    path,
                    filetype = Sys.getenv("GEOTARGETS_GDAL_RASTER_DRIVER"),
                    overwrite = TRUE,
                    gdal = strsplit(
                        Sys.getenv("GEOTARGETS_GDAL_RASTER_CREATION_OPTIONS",
                                   unset = ";"),
                        ";")[[1]]
                )
            },
            marshal = function(object) terra::wrap(object),
            unmarshal = function(object) terra::unwrap(object)
        )

For lists, we could potentially wrap these steps up around some kind of lapply for reading/writing/marshalling/unmarshalling, perhaps?

There might be a solution to this inside of https://github.com/njtierney/geotargets/pull/76

I'll have to think about this some more.

But I think that we could probably get some of the way towards solving this problem using branching in targets. I've got Friday Brain right now so hopefully this will be clearer for Monday Brain to think about!

brownag commented 1 month ago

I think at a minimum we should add an analog for SpatVectorCollection (i.e. tar_terra_svc() for terra::svc() results)

My initial response in the case of lists, which can be nested and heterogeneous, is that this may be a bit difficult to handle in general for geotargets methods.

However I can totally see the utility of list columns within a data.frame being a useful way to manage multiple terra objects and their metadata... so this is perhaps worth considering more.

Aariq commented 1 month ago

You can use branching in targets to iterate over lists of terra objects. Here's a modified version of your example:

library(targets)

tar_dir({ # tar_dir() runs code from a temporary directory.
    tar_script({
        library(geotargets)
        library(tarchetypes)
        lux_area <- function(file, projection = "EPSG:4326") {
            terra::project(
                terra::vect(file),
                projection
            )
        }
        list(
            tar_files(
                files,
                c(system.file("ex", "lux.shp", package = "terra"),
                  system.file("ex", "lux.shp", package = "terra"))
            ),
            tar_terra_vect(
                terra_vect_example,
                lux_area(files),
                pattern = map(files),
                iteration = "list"
            )
        )
    })

    tar_make()
    x <- tar_read(terra_vect_example)
    x
})
#> ▶ dispatched target files_files
#> ● completed target files_files [0.004 seconds]
#> ▶ dispatched branch files_2e9093554907f947
#> ● completed branch files_2e9093554907f947 [0 seconds]
#> ▶ dispatched branch files_25283503ca445cc3
#> ● completed branch files_25283503ca445cc3 [0 seconds]
#> ● completed pattern files
#> ▶ dispatched branch terra_vect_example_d7f903dd8a887dcb
#> ● completed branch terra_vect_example_d7f903dd8a887dcb [0.03 seconds]
#> ▶ dispatched branch terra_vect_example_27eadfd4cadd92eb
#> ● completed branch terra_vect_example_27eadfd4cadd92eb [0.037 seconds]
#> ● completed pattern terra_vect_example
#> ▶ ended pipeline [0.331 seconds]
#> $terra_vect_example_d7f903dd8a887dcb
#>  class       : SpatVector 
#>  geometry    : polygons 
#>  dimensions  : 12, 6  (geometries, attributes)
#>  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
#>  source      : terra_vect_example_d7f903dd8a887dcb
#>  coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#>  names       :  ID_1   NAME_1  ID_2   NAME_2  AREA   POP
#>  type        : <num>    <chr> <num>    <chr> <num> <int>
#>  values      :     1 Diekirch     1 Clervaux   312 18081
#>                    1 Diekirch     2 Diekirch   218 32543
#>                    1 Diekirch     3  Redange   259 18664
#> 
#> $terra_vect_example_27eadfd4cadd92eb
#>  class       : SpatVector 
#>  geometry    : polygons 
#>  dimensions  : 12, 6  (geometries, attributes)
#>  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
#>  source      : terra_vect_example_27eadfd4cadd92eb
#>  coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#>  names       :  ID_1   NAME_1  ID_2   NAME_2  AREA   POP
#>  type        : <num>    <chr> <num>    <chr> <num> <int>
#>  values      :     1 Diekirch     1 Clervaux   312 18081
#>                    1 Diekirch     2 Diekirch   218 32543
#>                    1 Diekirch     3  Redange   259 18664

Created on 2024-05-24 with reprex v2.1.0

Note that itereration must be "list" as there is not (currently) a valid method for the default vec_c() to work on SpatRaster or SpatVector objects. Also, you need something like tarchetypes::tar_files() because branching over a target with format = "file" is not allowed. That is, you'll get an error if thefiles target is constructed with tar_target() and format = "file"

geryan commented 1 month ago

Thanks @Aariq this is a really helpful approach