r-lidar / lidR

Airborne LiDAR data manipulation and visualisation for forestry application
https://CRAN.R-project.org/package=lidR
GNU General Public License v3.0
614 stars 134 forks source link

Large rasters cause integer overflow in `raster_cell_from_xy()` #785

Open bfrank5 opened 1 month ago

bfrank5 commented 1 month ago

Hello,

I noticed an issue attempting to use a large DTM (~100,000 ha, 0.3 m resolution) during the normalization of points. The error I receive is just a warning indicating that the normalization algorithm switches during the operation to some default. Regardless of this specific context, I narrowed down the issue to a problem in raster_cell_from_xy() which must convert (in my case) extremely large cell indices into integers, which exceeds the capacity of as.integer, forcing the indices to return as NA values. Here is a reproducible example:

library(terra)

extent <- ext(c(0, 1e05, 0, 1e05))
raster <- rast(extent, resolution = 1, vals = 0)

# Create an arbitrary point
x <- 50
y <- 50

# Get the cell index
lidR:::raster_cell_from_xy(raster, x, y)

Producing the output:

[1] NA
Warning message:
In lidR:::raster_cell_from_xy(raster, x, y) :
  NAs introduced by coercion to integer range

I believe the particular line in question is located here:

https://github.com/r-lidar/lidR/blob/09a878bca524122a19fc40d7c76db5780236dcaa/R/utils_raster.R#L75

Jean-Romain commented 1 month ago

Can you clip your dtm to the extent of your point cloud? At some point R has on 32 bits signed integer and 64 bits double. This limits how many integers we can handle. Do you know how terra deals with that ?

bfrank5 commented 1 month ago

Hey JR,

I think the clipping is a workaround, but I was having trouble with parallel processing in catalog_apply using terra::crop. I run into a std::bad_alloc error, but I don't have a lot of specific information yet (nor anything reproducible). I could potentially tile the dtm into separate files and dynamically fetch the file path in catalog_apply, but trying a few other ideas first.

(Edit: I resolved the std::bad_alloc error, it turned out I was clipping with the wrong bounding box)

I am playing with simply removing the as.integer() call right now using a fork of the repository, and it appears to work fine, obviously you will want a more rigorous solution. I am still checking a larger dataset now.

As far as I can tell, at least for my application using normalize_heights, the next step is here:

https://github.com/r-lidar/lidR/blob/09a878bca524122a19fc40d7c76db5780236dcaa/R/utils_raster.R#L94-L121

where cells is the vector if indices attained previously. Drilling down, the terra::crop implementation ultimately goes here:

https://github.com/rspatial/terra/blob/e5458c22fc97a51bb73523ac8c971e975cda0afa/R/extract_single.R#L138-L155

I think the operable part here is rowColFromCell, which you can see the typing for here in the header file here:

https://github.com/rspatial/terra/blob/e5458c22fc97a51bb73523ac8c971e975cda0afa/src/spatRaster.h#L426

and appears to take a double. I can't speak to the other methods (e.g., raster::extract).

I wonder if it is sufficient to check if it is a whole number? And even then, terra::crop rounds the indices initially to the nearest whole number.

Up to you! I mostly wanted to stick this here in case anyone else runs into the same problem.