rspatial / terra

R package for spatial data handling https://rspatial.github.io/terra/reference/terra-package.html
GNU General Public License v3.0
544 stars 90 forks source link

Incorrect values & weird blockiness from terra::project when projecting SpatRaster with 100s of layers #1327

Open nafreymueller opened 1 year ago

nafreymueller commented 1 year ago

Hi,

I came across this issue where incorrect values are returned in this weird blocky pattern when I run terra::project on a SpatRaster, but this only starts to happen after that SpatRaster starts having more than ~200 layers. The issue persists even after restarting R.

The actual code is just to illustrate the issue - it'd be cumbersome to attach the entire NetCDF file with 100s of layers.

# my.nc is a file with historical Arctic sea ice concentration reconstructions from a climate model in WGS84 projection
# monthstoextract is a numeric vector denoting various summer months in various years since 1849
# conts.ras90 is a SpatRaster in the pan-Arctic projection
# conts.ps is a sf polygon that is used to help see the coastlines in the pan-Arctic projection

# processing
n <- 1:10 # layers to extract from
icec <- terra::rast(my.nc)[[monthstoextract[n]]] # reading in data
a2 <- icec[[2]] # filtering to a SpatRaster with a single layer
a2.p <- terra::project(x=a2, y=conts.ras90) # re-projecting the SpatRaster with a single layer

# plotting
par(mfrow=c(1,1))
plot(a2.p, main='June 1849 - single layer'); plot(conts.ps$geometry, add=T)

project1

Now I run the following code where I project a SpatRaster with multiple layers.

I do this four times, changing n each time (n <- 1:10; n <- 1:200; n <- 1:250; n <- 1:680).

I then plot icec[[2]] four times with the four different n's. icec[[2]] is the exact same layer as a2.p from earlier. The only difference is that it was in a SpatRaster that had multiple layers when it was reprojected:

par(mfrow=c(2,2))

n <- 1:10 # layers to extract from
icec <- terra::rast(my.nc)[[monthstoextract[n]]] # reading in data
icec <- terra::project(x=icec, y=conts.ras90) # re-projecting the original SpatRaster with many layers
plot(icec[[2]], main='June 1849 - 10 layers'); plot(conts.ps$geometry, add=T)

n <- 1:200 # layers to extract from
icec <- terra::rast(my.nc)[[monthstoextract[n]]] # reading in data
icec <- terra::project(x=icec, y=conts.ras90) # re-projecting the original SpatRaster with many layers
plot(icec[[2]], main='June 1849 - 200 layers'); plot(conts.ps$geometry, add=T)

n <- 1:250 # layers to extract from
icec <- terra::rast(my.nc)[[monthstoextract[n]]] # reading in data
icec <- terra::project(x=icec, y=conts.ras90) # re-projecting the original SpatRaster with many layers
plot(icec[[2]], main='June 1849 - 250 layers'); plot(conts.ps$geometry, add=T)

n <- 1:680 # layers to extract from
icec <- terra::rast(my.nc)[[monthstoextract[n]]] # reading in data
icec <- terra::project(x=icec, y=conts.ras90) # re-projecting the original SpatRaster with many layers
plot(icec[[2]], main='June 1849 - 680 layers'); plot(conts.ps$geometry, add=T)

project_10_200_250_680

Everything seems fine for n <- 1:10 and n <- 1:200. But when I repeat with n <- 1:250, these weird blocky patterns emerge in various locations (the Bering Sea is the most severe). This progressively worsens with increasing numbers of layers in the SpatRaster, terminating at 680, which is currently the maximum number of layers I'm working with. Changing n <- 1:450 (not shown) resulted in a blockiness level that was intermediate between n <- 1:250 and n <- 1:680. When I played around with other values, >200 layers seemed to be the cutoff where this behaviour started.

I should be able to work around this for now by just running a for loop where I filter the SpatLayer to a single layer (or only a few layers), reproject, write to disk, and can then load it in later. However, I know that I will likely need to be working with SpatRasters with 1000s of layers soon, and a for loop just won't scale well at that point.

Any thoughts/insights as to why this is happening? I initially thought it might be a wrapping-around-the-antimeridian issue, but this blockiness persists elsewhere (e.g., Sea of Okhotsk, Baffin Bay, south of Svalbard, etc.).

Cheers, Nick

kadyb commented 1 year ago

I suspect this problem may be due to processing raster in blocks from disk. If the raster is too large to be loaded into memory, it is processed sequentially in blocks. Check mem_info().

nafreymueller commented 1 year ago

Here's what I get when I tried your suggestion:

free_RAM()/1e6 # converting from kB to GB
[1] 4.336736

n <- 1:680
icec <- terra::rast(my.nc)[[monthstoextract[n]]] # reading in data
mem_info(icec)

------------------------
Memory (GB) 
------------------------
check threshold : 1  (memmin)
available       : 3.99
allowed (60%)   : 2.39
needed (n=1)    : 0.33
------------------------
proc in memory  : TRUE
nr chunks       : 1
------------------------

Or is the n argument in mem_info() supposed to be the number of layers in the SpatRaster? The documentation for mem_info() says "n positive integer. The number of copies of x that are needed", which I guess could refer to the number of layers. That would make sense in terms of exceeding the 60% allotted memory at some point. I start exceeding the 60% allotted memory at n≥505 layers.

It starts using 2 chunks at n≥307, and uses 3 at n≥615, but this blockiness behaviour started sooner when I was reprojecting ≥~200 layers. If I had to guess, maybe this is cause it's using additional memory to do the reprojecting, and that's what's taking it over the 60% memory allotment? I hadn't seen this blockiness issue when using raster::projectRaster() in the past, but maybe that's just a fundamental difference between how raster and terra allocate memory and do all the things within R vs offloading to C++?

n <- 1:680
icec <- terra::rast(my.nc)[[monthstoextract[n]]] # reading in data

mem_info(x=icec, n=307)

------------------------
Memory (GB) 
------------------------
check threshold : 1  (memmin)
available       : 4.11
allowed (60%)   : 2.47
needed (n=307)   : 1.48
------------------------
proc in memory  : TRUE
nr chunks       : 2

mem_info(x=icec, n=505)

------------------------
Memory (GB) 
------------------------
check threshold : 1  (memmin)
available       : 4.06
allowed (60%)   : 2.44
needed (n=505)   : 2.44
------------------------
proc in memory  : FALSE
nr chunks       : 2
------------------------

mem_info(x=icec, n=680)

------------------------
Memory (GB) 
------------------------
check threshold : 1  (memmin)
available       : 4.1
allowed (60%)   : 2.46
needed (n=680)   : 3.28
------------------------
proc in memory  : FALSE
nr chunks       : 3
------------------------
kadyb commented 1 year ago

Or is the n argument in mem_info() supposed to be the number of layers in the SpatRaster?

mem_info() gives the required memory for the whole object -- all layers, not just one. I think that this raster will be copied at most a few times, not a few hundred, so the value of the n argument should be low. I think that the problem is not in processing data from the disk, because the raster is completely loaded into memory in this case.

kadyb commented 1 year ago

I think I can reproduce this and this is some bug in project():

library("terra") # 1.7.55
gdal(lib = "")
#>    gdal     proj     geos
#> "3.6.2"  "9.2.0" "3.11.2"
set.seed(1)

n = 300
nlyr = 300
v = sample(1:5, size = n ^ 2, replace = TRUE)
v = rep(v, times = nlyr)
r = rast(nrows = n, ncols = n, nlyrs = nlyr, vals = v, crs = "EPSG:4326")

rr_1 = project(r[[1]], "EPSG:3857", method = "bilinear") # one layer
rr_300 = project(r, "EPSG:3857", method = "bilinear") # all layers

par(mfrow = c(1, 2))
plot(rr_1, legend = FALSE, axes = FALSE)
plot(rr_300[[nlyr]], legend = FALSE, axes = FALSE)

Rplot

mdsumner commented 1 year ago

@kadyb it's not sensible to blithely reproject a global longlat raster to EPSG:3857 - it's natural to get strange shapes like that when the bounds of the projection are asymptotic - please use a target grid not just a string, like the OP has done

@nafreymueller can you please provide a reproducible example, enough would be one layer from the dataset your are projecting and the one you are projecting to - this information is not in the OP, there's more than one "pan-Arctic projection"

I'm chiming in here because I think the mem and projection to Mercator diversions are unhelpful, and ultimately this is GDAL doing the reprojecting so I'd like to see if it's reproducible there - there's been a lot of anti-meridian related fixes in the GDAL warper recently, and more are still needed - it would be enough if you could share one of your source files and the specifications of conts.ras90

kadyb commented 1 year ago

@mdsumner, I just wanted to create some reprex. In my example, for identical inputs, shouldn't the result be identical regardless of the number of layers? I would expect that the result should be the same.

mdsumner commented 1 year ago

oh I see, I'm sorry you're right

nafreymueller commented 1 year ago

Thanks @mdsumner and @kadyb for your insights. Here's a worked example with one raster (example rasters in zip file attached at the bottom):

library("terra") # 1.7.55

### projections
# wgs1984 projection
wgs84 <- '+proj=longlat +datum=WGS84 +no_defs'

# pan-arctic projection with prime meridian as central meridian - units are km
# this is the projection that iceconc should be projected to
ps.crs_km <- "+proj=stere +lat_0=90 +lat_ts=71 +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=km +no_defs +ellps=WGS84 +towgs84=0,0,0"

### raster data
# ice concentration from June 1849 in wgs1894 projection
iceconc <- terra::rast('iceconc.tif')
iceconc; plot(iceconc)

# binary raster in the pan-arctic projection denoting where ocean grid cells are
arcticrast <- terra::rast('arcticrast.tif')
arcticrast; plot(arcticrast)

### replicating iceconc many times
iceconc680 <- rep(iceconc, 680)

### reprojecting iceconc and iceconc680
# single layer
iceconc_reproj <- terra::project(x=iceconc, y=arcticrast)
iceconc_reproj; plot(iceconc_reproj)
# 680 layers
iceconc680_reproj <- terra::project(x=iceconc680, y=arcticrast)
iceconc680_reproj; plot(iceconc680_reproj[[1]])

When I reproject a SpatRaster with 1 layer, I get this, which seems reasonable: iceconc1

But when I reproject a SpatRaster with 680 layers, I get this block pattern: iceconc680

It's worth noting that I now get this warming:

Warning message:
/myWorkingDirectory/projectIssueExample/iceconc.tif: AdviseRead(): nBandCount cannot be greater than 1 (GDAL error 5)

Maybe this indicates an underlying GDAL issue after all? Is there a single fix within terra for this?

projectIssueExample.zip

rhijmans commented 11 months ago

Update (no solution yet). I see this happening with > 110 layers

library("terra") # 1.7.55
wgs84 <- '+proj=longlat +datum=WGS84 +no_defs'
ps.crs_km <- "+proj=stere +lat_0=90 +lat_ts=71 +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=km +no_defs +ellps=WGS84 +towgs84=0,0,0"

ice <- terra::rast('iceconc.tif')
arctic <- terra::rast('arcticrast.tif')
ice111 <- rep(ice, 111)

# single layer
icep <- terra::project(x=ice, y=arctic)
# 111 layers
ice111p <- terra::project(x=ice111, y=arctic)

plot( icep - ice111p[[1]] )
mdsumner commented 11 months ago

I am pretty certain this is a GDAL issue, will link a bug report.

mdsumner commented 11 months ago

ok it's about the working memory (increase it with options = c("-wo", "600") which takes it from 60Mb to 600Mb - but, I don't think project() supports that, it might be possibl by config. I haven't figured out if setting XSCALE/YSCALE can fix but I'm interested to learn about that more.