r-lidar / lasR

Fast and Pipable Airborne LiDAR Data Tools
https://r-lidar.github.io/lasR/
GNU General Public License v3.0
43 stars 0 forks source link

Segfault on pipeline execution #48

Closed cedricr closed 2 weeks ago

cedricr commented 2 weeks ago

I’m trying to extract vegetation data from the french IGN LidarHD data.

I’m running a simple pipeline on all tiles intersecting a city geometry, in order to generate a chm. In some cases, the pipeline crashes (hard crash on RStudio, with a "R Session Aborted" dialog which doesn't let me see any error messages. In VSCode, I get the following error:

*** caught segfault ***
address 0x7f21b7000008, cause 'memory not mapped'

Here’s a minimal repro:

library(lasR)
url = "https://storage.sbg.cloud.ovh.net/v1/AUTH_63234f509d6048bca3c9fd7928720ca1/ppk-lidar/SP/LHD_FXX_1042_6297_PTS_C_LAMB93_IGN69.copc.laz"
filename = "test.copc.laz"
download.file(url, filename)
pipeline <- reader_las() + normalize() + delete_points(keep_class(5)) 
exec(pipeline, on = filename, progress = TRUE)

What I think is happening: this tile is a mostly sea and beach, near Nice. There’s not vegetation (class 5) at all in it, so the delete_points makes it an empty point cloud. Indeed, if I keep class 2 instead of 5 everything works. That being said, if I remove the normalize step, the pipeline doesn’t crash…

If I rewrite this pipeline using LidR

library(lidR)
url = "https://storage.sbg.cloud.ovh.net/v1/AUTH_63234f509d6048bca3c9fd7928720ca1/ppk-lidar/SP/LHD_FXX_1042_6297_PTS_C_LAMB93_IGN69.copc.laz"
filename = "test.copc.laz"
download.file(url, filename)
las = readLAS(filename)
nlas = normalize_height(las, tin()) |> filter_poi(Classification == 5)

I get the following warning: Interpolation of 1429 points failed because they are too far from ground points. Nearest neighbor was used but interpolation is weak for those points, but no crash, and an empty point cloud as expected.

Subsidiary question:  in order to find the problematic file, I had to replace running a pipeline with lots of files by a for loop on the files, inside which I print the current file name, then run the pipeline on it. Is there a better way to see which file is currently processed, so that when there’s a crash, I can know what the problematic file is?

r$> packageVersion("lasR")
[1] ‘0.5.3’

r$> Sys.info()[c('sysname', 'release')]
                sysname                 release 
                "Linux" "6.8.8-300.fc40.x86_64" 
Jean-Romain commented 2 weeks ago

What failed is the new feature I added in 0.5.3. It crashed in delete_points when freeing and reallocating the memory. I will fix it asap.

Is there a better way to see which file is currently processed, so that when there’s a crash,

Use exec(..., verbose = TRUE) and run your script in a terminal (otherwise Rstudio won't last long enough to let you see what is printed)

cedricr commented 2 weeks ago

Thanks a lot !

Jean-Romain commented 2 weeks ago

Fixed.

That being said, if I remove the normalize step, the pipeline doesn’t crash…

This is because without normalize the pipeline can be streamed, i.e. 0 memory is allocated and points are processed one by one through the pipeline without loading the entire point cloud.

cedricr commented 2 weeks ago

Oh that was fast ! Thank you so much.