Closed njtierney closed 2 months ago
For option 2 we could default to using terra::fileBlocksize()
if ncol
and nrow
aren't supplied to tar_terra_tiles()
. We could also replace ncol
and nrow
with an argument template
that gets passed to the y
arg of getTileExtents()
.
library(terra)
#> terra 1.7.78
x <- rast(system.file("ex/elev.tif", package = "terra"))
getTileExtents(x, y = fileBlocksize(x))
#> xmin xmax ymin ymax
#> [1,] 5.741667 6.533333 49.83333 50.19167
#> [2,] 5.741667 6.533333 49.47500 49.83333
#> [3,] 5.741667 6.533333 49.44167 49.47500
Created on 2024-07-17 with reprex v2.1.0
I like the idea of a template object being passed along.
I think that if we have default behaviour for not specifying nrow/ncol we would be introducing multiple APIs in the same function tar_terra_tiles
:
This also introduces additional complexity into the code now to manage these two approaches. It's not a lot of extra complexity, but I would prefer to avoid scope creep early.
As a user I think this could be harder to understand / predict what rast_2x2
, and rast_blocksize
are doing:
tar_target(
elev_file,
system.file("ex/elev.tif", package="terra"),
format = "file"
),
tar_terra_rast(
elev_rast,
terra::rast(elev_file)
),
tar_terra_tiles(
name = rast_2x2,
raster = elev_rast,
ncol = 2,
nrow = 2
),
tar_terra_tiles(
name = rast_blocksize,
raster = elev_rast
)
So I think your idea of replacing nrow/ncol is good, we might need to make the user work a little harder and have to supply a template
in the form of xmin/xmax/ymin/ymax, but we can provide some tools/examples to make that easier. This also means we can separate out the 3 approaches I listed above into separate functions, that all produce extent. I haven't wrapped up these functions, but here is what a pipeline could look like with template
tar_target(
elev_file,
system.file("ex/elev.tif", package="terra"),
format = "file"
),
tar_terra_rast(
elev_rast,
terra::rast(elev_file)
),
tar_target(tile_2x2,
terra::rast(
terra::ext(elev_rast),
ncol = 2,
nrow = 2,
crs = terra::crs(elev_rast)
)
),
tar_target(
ext_2x2,
terra::getTileExtents(elev_rast, template)
),
tar_target(
ext_blocksize,
getTileExtents(elev_rast, y = fileBlocksize(elev_rast))
),
tar_terra_tiles(
name = rast_2x2,
raster = elev_rast,
template = ext_2x2
),
tar_terra_tiles(
name = rast_blocked,
raster = elev_rast,
template = ext_blocksize
)
The thing I don't love about this is that we're no longer taking advantage of an opportunity to introduce some consistency that isn't there with getTileExtents()
, so this wouldn't be as much of a value add as with simple ncol
and nrow
args. template
can be a SpatRaster, a SpatVector, or a length one or two numeric vector, which is confusing. Using fileBlocksize()
as a default would help so that users don't have to deal with the template
arg unless they want to.
I also think it would be ideal if you could just supply the template directly for simplicity of _targets.R
. Something like:
tar_terra_tiles(
name = rast_blocked,
raster = elev_rast,
template = fileBlocksize(elev_rast) #this would be default if `template` is omitted or NULL.
)
or
tar_terra_tiles(
name = rast_blocked,
raster = elev_rast,
template = rast(ext(elev_rast), ncol = 2, nrow = 2, crs = crs(elev_rast))
)
Or maybe, as I think you're suggesting, there are some helper functions so we could do something like:
tar_terra_tiles(
name = rast_blocked,
raster = elev_rast,
template = \(x) tile_blocksize(x) #or just `template = tile_blocksize`
)
OR
tar_terra_tiles(
name = rast_blocked,
raster = elev_rast,
template = \(x) tile_n(x, ncol = 2, nrow = 2)
)
template can be a SpatRaster, a SpatVector, or a length one or two numeric vector, which is confusing
I think perhaps we have overloaded the term template
here - you aren't referring to template
arg of tar_terra_tiles()
?
I was thinking that template would just be a vector that is the extent: xmin, xmax, ymin, ymax, or perhaps using the tile index information from grout: https://github.com/hypertidy/grout. The reason I like the output from grout is that we have an abstraction around tiling that provides flexibility.
Potentially, you could craft your own extents that aren't tiles exactly, I cannot imagine a solid/common usecase for that, but it might be handy?
Annoyingly, extent as returned by terra is an pointer, which means we might need a custom target function to handle terra::ext, so you can't just supply that as I have in my above example. Another reason that just returning data, as grout does, might be a goer.
library(terra)
#> terra 1.7.78
elev_file <- system.file("ex/elev.tif", package = "terra")
r <- rast(elev_file)
r_ext <- ext(r)
class(r_ext)
#> [1] "SpatExtent"
#> attr(,"package")
#> [1] "terra"
as.numeric(r_ext)
#> Error in as.numeric(r_ext): cannot coerce type 'S4' to vector of type 'double'
Created on 2024-07-19 with reprex v2.1.1
Using
fileBlocksize()
as a default would help so that users don't have to deal with the template arg unless they want to.
Yes I think I like this as a default for tar_terra_tiles()
.
I was not imagining an anonymous function, but rather something like:
tar_target(
rast_blocksize,
tile_blocksize(x)
),
tar_terra_tiles(
name = rast_blocked,
raster = elev_rast,
template = rast_blocksize # or tile_blocksize(x) or tile_n(x, ncol = 2, nrow = 2)
)
Tagging @mdsumner in here as he wrote grout and I'd be curious to hear his thoughts on this
template can be a SpatRaster, a SpatVector, or a length one or two numeric vector, which is confusing
I think perhaps we have overloaded the term
template
here - you aren't referring totemplate
arg oftar_terra_tiles()
?I was thinking that template would just be a vector that is the extent: xmin, xmax, ymin, ymax, or perhaps using the tile index information from grout: https://github.com/hypertidy/grout. The reason I like the output from grout is that we have an abstraction around tiling that provides flexibility.
I was thinking that the template
argument of tar_terra_tiles()
would just get passed to the y
argument of getTileExtents()
, but it doesn't have to be that way. I want to avoid having to craft my own object for whatever gets used as a template—I usually will just want to split a raster into a reasonable number of tiles and don't care what shape they are or if they're all even.
Annoyingly, extent as returned by terra is an pointer, which means we might need a custom target function to handle terra::ext, so you can't just supply that as I have in my above example. Another reason that just returning data, as grout does, might be a goer.
Yeah, this is exactly why I ended up writing create_tile_exts()
to return a list of numeric vectors. So instead of being a basically internal function, it could be repurposed as a helper function for the template
arg of tar_terra_tiles()
I was not imagining an anonymous function, but rather something like:
tar_target( rast_blocksize, tile_blocksize(x) ), tar_terra_tiles( name = rast_blocked, raster = elev_rast, template = rast_blocksize # or tile_blocksize(x) or tile_n(x, ncol = 2, nrow = 2) )
This would be fine (although I think x
above would have to be elev_rast
for this to work), but again, I'd like to find a solution where I really don't have to think much about constructing the template object unless I want to. That includes (ideally, from my perspective) not having to create a separate target for it manually.
I think specifying tile size is a good default, check that it's a valid value - a reasonable fallback is to treat each row as a tile and that's probably what terra does (it's what raster did, and is what GDAL does).
It doesn't make sense to me to have the user specify a tiling dimension, because the opportunity for optimization is about the size of each tile and a multiplier of that is very simple.
Also the terra interface for it has some unexpected behaviours for centring the scheme, and I think that logic belongs elsewhere ( like terra itself).
I personally need the logic for the schemes completely independently of any package or library, it's extremely powerful (see how much the xarray community cares about "chunking" as one example, but even determining the number and size of tiles and where they go - might be in memory to S3 storage for example)
Going to close this as most of this advice was implemented in #84 and #90
It seems there could be 3 practical ways to specify tile size