njtierney / geotargets

Targets extensions for geospatial data
https://njtierney.github.io/geotargets/
Other
49 stars 4 forks source link

Supporting various split-apply-combine spatial workflows #74

Open Aariq opened 1 month ago

Aariq commented 1 month ago

So there are at least 4 different ways (yikes!) to do a split-apply-combine type workflow with spatial data just in terra that I've encountered. It may be beneficial to figure out a way to implement these workflows using targets branching in geotargets

1) SpatRasters or SpatVectors with multiple layers can be split with [[ and recombined with c() or rast()/vect(). Also can be iterated over with lapply() or coerced to a list with as.list() 2) SpatRasterCollections hold a list of SpatRasters, possibly with different extents, resolutions, and projections. Can be split with [ (but not [[) and combined with terra::sprc(). lapply() also works on them and can be coerced with as.list(). (see #53) 3) SpatRasterDatasets hold a list of SpatRasters with the same resolution and projection (sub-datasets). Can be split with [ or [[ and re-combined with terra::sds(). lapply() also works and can be coerced with as.list(). names are lost in the process (I assume this is a bug: https://github.com/rspatial/terra/issues/1513). (see #59) 4) makeTiles() splits a raster into tiles that are saved to disk and returns a vector of file paths. Can be opened individually, worked on, and re-combined with merge(sprc(rast(<list of tiles as SpatRasters>))). Alternatively, the files on disk can be opened with vrt() (see #69)

Some open questions:

  1. Which of these (if any) should we try to support better in geotargets (e.g. with target factories)?
  2. Are there existing similar patterns we can borrow from other Targetopia packages? (knowing that we can't customize the behavior of the iteration argument to tar_target_raw())
  3. Will any of these be easier to do if we adopt an approach like #63? (I suspect that a tar_terra_tiles() would be easier like this)
  4. Should we try to figure this stuff out soon, or later after we've added basic support for sf and stars?
Aariq commented 1 month ago

A 5th way: just iterating over lists of rasters or vectors (#77).

Given that these keep coming up, and that it is confusing complicated stuff to implement (see my ongoing struggles in #76), it might be prudent to find a time we (@njtierney, I, and possibly @brownag) can talk in real-time about this. I want to get some kind of vision about how much of this we build in to geotargets and how much we just document in vignettes and leave it to the users (e.g. my response in #77, or how I implemented iterating over tiles before #76). And if we do implement some solutions, how generalized can/should they be? Would each of these 5 ways need bespoke target factories if we implemented something for them in geotargets?

Aariq commented 1 month ago

This is what a workflow for 1. (iterating over layers) looks like:

library(targets)

tar_dir({ # tar_dir() runs code from a temporary directory.
    tar_script({
        library(geotargets)
        library(terra)
        make_rast <- function() {
            a <- terra::rast(nrows = 108, ncols = 21, xmin = 0, xmax = 10)
            values(a) <- 10
            names(a) <- "layer 1"
            b <- terra::rast(nrows = 108, ncols = 21, xmin = 0, xmax = 10)
            values(b) <- 100
            names(b) <- "layer 2"
            c(a, b)
        }
        sub_lyr <- function(rast, index) {
            rast[[index]]
        }
        combine_rast <- function(rast_list) {
            real_names <- lapply(rast_list, names)
            out <- rast(rast_list)
            names(out) <- real_names
            out
        }
        list(
            tar_terra_rast(
                rast_example,
                make_rast()
            ),
            tar_target(
                layers,
                names(rast_example)
            ),
            tar_terra_rast(
                rast_list,
                sub_lyr(rast_example, layers),
                pattern = map(layers),
                iteration = "list"
            ),
            tar_terra_rast(
                rast_plus_2,
                rast_list + 2,
                pattern = map(rast_list),
                iteration = "list"
            ),
            tar_terra_rast(
                rast_combined,
                combine_rast(rast_plus_2)
            )
        )
    })

    tar_make()
    tar_read(rast_combined)
})
#> terra 1.7.71
#> ▶ dispatched target rast_example
#> ● completed target rast_example [0.088 seconds]
#> ▶ dispatched target layers
#> ● completed target layers [0 seconds]
#> ▶ dispatched branch rast_list_93203888e2b38c7f
#> ● completed branch rast_list_93203888e2b38c7f [0.004 seconds]
#> ▶ dispatched branch rast_list_a6b0d01bda4d921c
#> ● completed branch rast_list_a6b0d01bda4d921c [0.023 seconds]
#> ● completed pattern rast_list
#> ▶ dispatched branch rast_plus_2_2e5221e915e2a41b
#> ● completed branch rast_plus_2_2e5221e915e2a41b [0.004 seconds]
#> ▶ dispatched branch rast_plus_2_589aaf2880aa3b86
#> ● completed branch rast_plus_2_589aaf2880aa3b86 [0.003 seconds]
#> ● completed pattern rast_plus_2
#> ▶ dispatched target rast_combined
#> ● completed target rast_combined [0.022 seconds]
#> ▶ ended pipeline [0.503 seconds]
#> class       : SpatRaster 
#> dimensions  : 108, 21, 2  (nrow, ncol, nlyr)
#> resolution  : 0.4761905, 1.666667  (x, y)
#> extent      : 0, 10, -90, 90  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : rast_combined 
#> names       : layer 1, layer 2 
#> min values  :      12,     102 
#> max values  :      12,     102

Created on 2024-05-24 with reprex v2.1.0

geotargets could provide a tar_terra_map_lyr() function that would make all the targets necessary to get to rast_plus_2 from

tar_terra_map_lyr(
  rast_plus_2,
  rast_example + 2
)

I'm not sure how often this workflow would really be necessary or computationally more efficient than just rast_example + 2 (or whatever), so I'd put this potential target factory at low priority.