r-lidar / lidR

Airborne LiDAR data manipulation and visualisation for forestry application
https://CRAN.R-project.org/package=lidR
GNU General Public License v3.0
582 stars 130 forks source link

LasCatalog: several crashes with different dataset #761

Closed ErwannHouzay closed 2 months ago

ErwannHouzay commented 2 months ago

Hello;

This is probably a miss use of LidR.

I have a bunch of LAS file coming from UAV acquisistion. My goal for is to extract a CHM from them. Here if the processing

(notice that I'm using sequential/I tries to use multi session and things are even worst)

  #plan(multisession, workers = 4L)
  plan(sequential)
  set_lidr_threads(4L)

  #initialize LAS catalog
  print(outputDirProcessing)
  ctg <- readLAScatalog(tilesFolder)

  #generate DTM (here we assume that the DTM iscontaines into memory but we can work by tiles and save on disk)
  opt_output_files(ctg) <- paste0(outputDirProcessing, "{*}_classified")
  mycsf <- csf(sloop_smooth = FALSE, class_threshold = 0.2, cloth_resolution = 0.2, rigidness = 2)
  classified_ctg  <- classify_ground(ctg, mycsf)

  #generate DTM
  opt_output_files(ctg) <- paste0(outputDirProcessing, "{*}_dtm")
  dtm_tin <- rasterize_terrain(ctg, res = 0.3, algorithm = tin(), package="terra")
  terra::writeRaster(dtm_tin, outputDTMPath, overwrite=TRUE)

  #generate Normalize height
  opt_output_files(ctg) <- paste0(outputDirProcessing, "{*}_normalized")
  ctg_norm <- normalize_height(ctg, dtm_tin)

  #cleanup values
  opt_filter(ctg_norm) <- "-drop_z_below 0"

  #rasterize canopy and clean up
  opt_output_files(ctg_norm) <- paste0(outputDirProcessing, "{*}_chm")
  chm <- rasterize_canopy(ctg_norm, res = 0.3, p2r(subcircle = 0.3), pkg = "terra", overwrite=TRUE)

  fill.na <- function(x, i=5) { if (is.na(x)[i]) { return(mean(x, na.rm = TRUE)) } else { return(x[i]) }}
  w <- matrix(1, 3, 3)
  filled <- terra::focal(chm, w, fun = fill.na)
  smoothed <- terra::focal(chm, w, fun = mean, na.rm = TRUE)
  terra::writeRaster(smoothed, outputCHMPath, overwrite=TRUE)

Due to the size of the files 15Gb, I splitted them into small tile (150mx150m). This is done using LasPY API.

Data resolution is 5cm. But for now a 30cm CHM is largely enouth for me.

I have several data and some of them are litterally crashing R. For others after ground classification the DTM extraction is failling on all the tiles and then calculation hangs. Still when I open classified LAS tiles they are OK.

I'm not an expert in R and I don't know how to debug or provide more information.

I can indeed share some files if needed...

My feeling is that I probably missed something. What?

Could you help me?

Jean-Romain commented 2 months ago

Please share a file that cashes. A crash is never the user responsability btw. It always a bug on the developer side.

ErwannHouzay commented 2 months ago

Thanks for you support ;-)

First file (tell me when you get it, I will let it until 17h Paris time max):

https://drive.google.com/file/d/17pU0OG4tQQ6QzR4JIkayj3WcS-x6vy7y/view?usp=drive_link

I just tried to tile the file with LidR 200x200m and get a similar error when creating the DTM and all the tiles are in grey (empty...) any_list[[1]] : indice hors limites image

As you can see the ground calssification for each tile is OK....

Jean-Romain commented 2 months ago

The ground classification is working? So where does it crashes? Please report a minimal reproducible example with this file.

ErwannHouzay commented 2 months ago

Here the code use with file retile.

require(lidR)
require(vctrs)
require(future)
require(RCSF)

segment_LIDAR <- function(tilePath,tileName,outputDir,tilesDir)
{
    outputDirProcessing=paste(paste(outputDir,tileName,sep=""),"/",sep="")

    outputDTMPath=paste(outputDir,paste(tileName,"_DTM.tif",sep=""),sep="")
    outputCHMPath=paste(outputDir,paste(tileName,"_CHM.tif",sep=""),sep="")

    tilesFolder=paste(tilesDir,tileName,sep="")

    plan(sequential)
    set_lidr_threads(4L)
    orig_ctg = readLAScatalog(tilePath)

    # Create a new set of 200 x 200 m.las files with first returns only
    opt_chunk_buffer(orig_ctg) <- 0
    opt_chunk_size(orig_ctg) <- 200
    opt_output_files(orig_ctg) <- paste0(tilesFolder, "/retile_{XLEFT}_{YBOTTOM}")

    ctg = catalog_retile(orig_ctg)

    #generate DTM (here we assume that the DTM iscontaines into memory but we can work by tiles and save on disk)
    opt_output_files(ctg) <- paste0(outputDirProcessing, "/classified_{XLEFT}_{YBOTTOM}")
    mycsf <- csf(sloop_smooth = FALSE, class_threshold = 0.2, cloth_resolution = 0.2, rigidness = 2)
    classified_ctg  <- classify_ground(ctg, mycsf)

    #generate DTM
    opt_output_files(ctg) <- paste0(outputDirProcessing, "/dtm_{XLEFT}_{YBOTTOM}")
    dtm_tin <- rasterize_terrain(ctg, res = 0.3, algorithm = tin(), package="terra")
    terra::writeRaster(dtm_tin, outputDTMPath, overwrite=TRUE)

    #generate Normalize height
    opt_output_files(ctg) <- paste0(outputDirProcessing, "/normalized_{XLEFT}_{YBOTTOM}")
    ctg_norm <- normalize_height(ctg, dtm_tin)

    #cleanup values
    opt_filter(ctg_norm) <- "-drop_z_below 0"

    #rasterize canopy and clean up
    opt_output_files(ctg_norm) <- paste0(outputDirProcessing, "/chm_{XLEFT}_{YBOTTOM}")
    chm <- rasterize_canopy(ctg_norm, res = 0.3, p2r(subcircle = 0.3), pkg = "terra", overwrite=TRUE)

    fill.na <- function(x, i=5) { if (is.na(x)[i]) { return(mean(x, na.rm = TRUE)) } else { return(x[i]) }}
    w <- matrix(1, 3, 3)
    filled <- terra::focal(chm, w, fun = fill.na)

    w <- matrix(1, 3, 3)
    smoothed <- terra::focal(chm, w, fun = mean, na.rm = TRUE)
    terra::writeRaster(smoothed, outputCHMPath, overwrite=TRUE)
}

tilePath='Z2PAR3.las'

tileName='Z2PAR3'
tilesDir='../Drone2023Tiled/'
outputDir='../Drone2023Res/'

segment_LIDAR(tilePath,tileName,outputDir,tilesDir)

Ground classification works/DTM extractin is producing empty things

Jean-Romain commented 2 months ago

This is not a minimal reproducible example. Please remove everything that is not necessary to reproduce to reduce your example to 5 or 10 lines of code. And what is the problem? It crashed? Or it produced empty file? It is unclear now.

ErwannHouzay commented 2 months ago

This is a minimal processing pipeline we should be able to achieve with your tool. Let say:

The problems seems to appear when combining individual steps.

Jean-Romain commented 2 months ago

This code will take ages to finish. You shared a 10 GB file (!!!). I will never be able to debug that if I don't exactly know what is supposed to fail and where is is supposed to fail. And so far I don't even know where it crashes and where it produced empty files. Your file is so big that I can't even download it (it failed, I have to retry).

So, I'm requesting again, please investigate on your side, produce a minimal reproducible example. Be specific on what and where are the issues so I can know what to expect and where to search for a bug. I certainly do not need to run the smooth canopy so it is an easy 6 unnecessary lines.

Thank you

ErwannHouzay commented 2 months ago

Bellow the code with unecessary lines.

require(lidR)
require(vctrs)
require(future)
require(RCSF)

segment_LIDAR <- function(tilePath,tileName,outputDir,tilesDir)
{
    plan(sequential)
    set_lidr_threads(4L)

    outputDirProcessing=paste(paste(outputDir,tileName,sep=""),"/",sep="")

    outputDTMPath=paste(outputDir,paste(tileName,"_DTM.tif",sep=""),sep="")
    outputCHMPath=paste(outputDir,paste(tileName,"_CHM.tif",sep=""),sep="")

    tilesFolder=paste(tilesDir,tileName,sep="")

    orig_ctg = readLAScatalog(tilePath)

    # Create a new set of 200 x 200 m.las files with first returns only
    opt_chunk_buffer(orig_ctg) <- 0
    opt_chunk_size(orig_ctg) <- 200
    opt_output_files(orig_ctg) <- paste0(tilesFolder, "/retile_{XLEFT}_{YBOTTOM}")
    ctg = catalog_retile(orig_ctg)

    #generate DTM (here we assume that the DTM iscontaines into memory but we can work by tiles and save on disk)
    opt_output_files(ctg) <- paste0(tilesFolder, "/classified_{XLEFT}_{YBOTTOM}")
    mycsf <- csf(sloop_smooth = FALSE, class_threshold = 0.2, cloth_resolution = 0.2, rigidness = 2)
    classified_ctg  <- classify_ground(ctg, mycsf)

    #generate DTM
    opt_output_files(ctg) <- paste0(tilesFolder, "/dtm_{XLEFT}_{YBOTTOM}")
    dtm_tin <- rasterize_terrain(ctg, res = 0.3, algorithm = tin(), package="terra")
    terra::writeRaster(dtm_tin, outputDTMPath, overwrite=TRUE)
}

tilePath='Z2PAR3.las'
tileName='Z2PAR3'
tilesDir='/Drone2023Tiled/'
outputDir='/Drone2023Res/'

segment_LIDAR(tilePath,tileName,outputDir,tilesDir)

If you never test big files, then maybe I'm trying to use a tool that was not design to scale up? Do you have tested on big data? This file is the smallest one, we are closer to 50gb.

Just tell me if you've done such performance tests.... If not then it's a matter of memory corruption because honnestly according to the run path, I got different behaviour.

Jean-Romain commented 2 months ago

It has been designed to handle big files. But handling big files requires some skills and preparation. Did you create a spatial index of your files? Here files are particularly big and yes I've never met 10 or 50 GB files. Let me finish the download so I can see the potential troubles and have some material to say something.

ErwannHouzay commented 2 months ago

I may have a question on this specific point. Maybe The trap is here

When I retile I generate a catalog ctg = catalog_retile(orig_ctg)

Then if I want to now run a process on each tile, hom do I do?

I tried : opt_output_files(ctg) <- paste0(tilesFolder, "{*]_classified") mycsf <- csf(sloop_smooth = FALSE, class_threshold = 0.2, cloth_resolution = 0.2, rigidness = 2) But then the system said that for chunked file ORIGNALFILENAME is not existing

So I end-up with this syntax: opt_output_files(ctg) <- paste0(tilesFolder, "/classified_{XLEFT}_{YBOTTOM}") mycsf <- csf(sloop_smooth = FALSE, class_threshold = 0.2, cloth_resolution = 0.2, rigidness = 2) But then I start to have issue.

Currently I'm doing the following test:

Can you clarify? This is not super clear for me...

Jean-Romain commented 2 months ago

Your code looks good and is the correct way to do it.

I get the file. It contains 374 millions points, this is reasonable. Give me some time to run some tests.

ErwannHouzay commented 2 months ago

This is meter. The file is in a Coordinate reference system (so with big coordinates)

Jean-Romain commented 2 months ago

Ho I got it! look:

opt_output_files(ctg) <- paste0(tempdir(), "/dtm_{XLEFT}_{YBOTTOM}")
dtm_tin <- rasterize_terrain(ctg, res = 0.3, algorithm = tin(), package="terra")
terra::writeRaster(dtm_tin, outputDTMPath, overwrite=TRUE)

You used the non classified collection ctg instead classified_ctg. Thus you do not have ground points. Otherwise everything looks good so far.

Yet, there are other issues. Some minor issues, some improvements, some drastic speed improvements. I'll be back in 1 or 2 hours with an in depth reply.

ErwannHouzay commented 2 months ago

Multiple code changes... And probably mess up at the end.

I wll way for your feedback then. I'm pretty sure you can do something...

Jean-Romain commented 2 months ago
library(lidR)
orig_ctg = readLAScatalog("~/Téléchargements/CHM_Issue.las")

Dont' use catalog_retile

You never need catalog_retile in lidR. If you need to process small chunk the retiling will be performed on the fly. catalog_retile exists if you need to retile to use in other software (see later). So here you can gain several minutes of computations

Spatial indexing

You have a big file, you want to process small chunks, you need to load a buffer. You need a spatial index for your las file in order to be able to perform fast spatial queries. I recommande to use lasindex from LAStools but you can use lidR:::catalog_laxindex(). This will take some extra time but will speed up everything later

lidR:::catalog_laxindex(orig_ctg)

Classification

Here you can chunk on the fly. 200 m sounds good with this point density.

opt_chunk_buffer(orig_ctg) <- 10 # 10 meters buffer
opt_chunk_size(orig_ctg) <- 200
opt_output_files(orig_ctg) <- paste0(tempdir(), "/classified_{XLEFT}_{YBOTTOM}")
mycsf <- csf(sloop_smooth = FALSE, class_threshold = 0.2, cloth_resolution = 0.2, rigidness = 2)
classified_ctg  <- classify_ground(orig_ctg, mycsf)

DTM

You now have a tiled dataset without a spatial index

lidR:::catalog_laxindex(classified_ctg)

Compute the DTM

opt_chunk_buffer(classified_ctg) <- 10
opt_chunk_size(classified_ctg) <- 0 # processing by file
opt_output_files(classified_ctg) <- paste0(tempdir(), "/dtm_{XLEFT}_{YBOTTOM}")
dtm_tin <- rasterize_terrain(classified_ctg, res = 0.3, algorithm = tin())

Parallelization

Add this at the beginning. I did not test but I see no reason for it to fail expect memory limits (see after)

library(future)
plan(multisessions, workers = 2)

Full code

I did not benchmark but I can easily predict a significant speed up.

library(lidR)
#library(future)
#plan(multisessions, workers = 2)

orig_ctg = readLAScatalog("~/Téléchargements/CHM_Issue.las")
plot(orig_ctg)

lidR:::catalog_laxindex(orig_ctg)

opt_chunk_buffer(orig_ctg) <- 10 # 10 meters buffer
opt_chunk_size(orig_ctg) <- 200
opt_output_files(orig_ctg) <- paste0(tempdir(), "/classified_{XLEFT}_{YBOTTOM}")
mycsf <- csf(sloop_smooth = FALSE, class_threshold = 0.2, cloth_resolution = 0.2, rigidness = 2)
classified_ctg  <- classify_ground(orig_ctg, mycsf)

lidR:::catalog_laxindex(classified_ctg)

opt_chunk_buffer(classified_ctg) <- 10
opt_chunk_size(classified_ctg) <- 0 # processing by file
opt_output_files(classified_ctg) <- paste0(tempdir(), "/dtm_{XLEFT}_{YBOTTOM}")
dtm_tin <- rasterize_terrain(classified_ctg, res = 0.3, algorithm = tin())

lasR

You pipeline has nothing fancy at all. I mean ,it is a basic pipeline with standard operations. lidR was design to try and explore fancy and original processing. While lidR is capable of applying standard pipeline like your lidR is not optimal for that. Its internal design is made to allows R users to do almost anything and this comes at some performance cost. You should have a look to my new lasR package which is, according to some test we performed, the most powerful software on the market. It is not as versatile as lidR because it focuses on performances over versatility but for your use case it will beat lidR significantly. Your pipeline being very standard you can write it with lasR

library(lasR)

gnd = classify_with_csf(slope_smooth = FALSE, class_threshold = 0.2, cloth_resolution = 0.2, rigidness = 2)
tri = triangulate(filter = keep_ground())
dtm = rasterize(0.3, tri)
norm = transform_with(dtm)
write = write_las()

pipeline = reader_las() + gnd + tri + dtm + norm + write

ans = exec(pipeline, on = "~/Téléchargements/CHM_Issue.las", with = list(chunk = 200, ncores = 1, progress = TRUE))

Session crash

I bet that the crash arised because you ran out of memory. Your point cloud very dense and I guess the CSF produces A LOT of ground point, thus the triangulation of ground point produces A LOT of triangles. lasR is much more memory optimized than lidR, yet my session crashed because I ran out of memory when processing with 4 cores. I see no other explanation. I was not able to reproduce a crash in another way. So I applied the pipeline on a single core with lasR. lidR uses much more memory than lasR thus I'm pretty sure it can explain the crash when using future even when processing 200 m chunks. I don't know the specs of your computer but you are processing massive data so you must be aware of what you are doing and be in control of your processing.

I have 32 GB of RAM and I can't run your pipeline with more than 2 cores in lasR so I guess the max is 1 cores with lidR. lasR is still in development.

ErwannHouzay commented 2 months ago

I was sure that I was not using LidR correctly!

Let me re-test with your dual proposal. My computer as 128 Gb of memory... But agreed about crash origin.

I make a feeback in a while.

Probably LasR is then more optimized for our usage ;-) l will test both solutions then!

Jean-Romain commented 2 months ago

After testing the lasR pipeline it crashed somewhere beyond 75%. I will investigate why. lasR is still in development and is not considered fully stable but the goal is to become a professional software with a professional support based on founding. The software itself remains open source and free. But not my time.

https://github.com/r-lidar/lasR

ErwannHouzay commented 2 months ago

Thanks you for testing LasR.

At this stage we are R&D but later we will turn to production. I will focus on LidR and test the files not crossing pipeline and tell you

ErwannHouzay commented 2 months ago

I tested the LidR approach. This is much better.

Now I'm struggling with

' opt_filter(ctg_norm) <- "-drop_z_below 0" ' It looks to load everything in memory

Do I need before?: opt_chunk_buffer(ctg_norm) <- 10 opt_chunk_size(ctg) <- 200

Jean-Romain commented 2 months ago
opt_filter(ctg_norm) <- "-drop_z_below 0"

Should work. What makes you think that it does not?

 opt_chunk_size(ctg) <- 200

You don't need that. You can put 0. Your dataset it already tiled in 200 m tiles.

ErwannHouzay commented 2 months ago

Ok clear.

I had the impression that memory consuption was blowing out on another dataset.

I will run it on several files now. I will make a feedback afterward

Jean-Romain commented 2 months ago

in theory -drop_z_below 0 should remove only a tiny fraction of the point: few minor outlier where inaccuracies occurred. In term of memory I'm expecting virtually 0 effect

ErwannHouzay commented 2 months ago

Yes there's some noise in our clouds. It could be a nice idea to have the number of point remeoved. Sloud I use point_metrics function before and after or there's a better option?

I have this code chunkfor now to complete CHM extraction

ctg = readLAScatalog("cloud3D.las")

lidR:::catalog_laxindex(ctg)

opt_chunk_buffer(ctg) <- 10 # 10 meters buffer
opt_chunk_size(ctg) <- 150
opt_output_files(ctg) <- paste0(workdir, "/classified_{XLEFT}_{YBOTTOM}")

mycsf <- csf(sloop_smooth = FALSE, class_threshold = 0.2, cloth_resolution = 0.2, rigidness = 2)
classified_ctg  <- classify_ground(ctg, mycsf)

lidR:::catalog_laxindex(classified_ctg)

opt_chunk_buffer(classified_ctg) <- 10
opt_chunk_size(classified_ctg) <- 0 # processing by file
opt_output_files(classified_ctg) <- paste0(workdir, "/dtm_{XLEFT}_{YBOTTOM}")
dtm_tin <- rasterize_terrain(classified_ctg, res = 0.3, algorithm = tin())
terra::writeRaster(dtm_tin, outputDTMPath, overwrite=TRUE)

#generate Normalize height
opt_output_files(ctg) <- paste0(workdir, "/normalized_{XLEFT}_{YBOTTOM}")
opt_chunk_buffer(ctg) <- 10
opt_chunk_size(ctg) <- 0 # processing by file
ctg_norm <- normalize_height(ctg, dtm_tin)

lidR:::catalog_laxindex(ctg_norm)

#rasterize canopy and clean up
opt_chunk_size(ctg) <- 0
#cleanup values
opt_filter(ctg_norm) <- "-drop_z_below 0"

lidR:::catalog_laxindex(ctg_norm)

opt_output_files(ctg_norm) <- paste0(workdir, "/chm{XLEFT}_{YBOTTOM}")
opt_chunk_buffer(ctg_norm) <- 10
opt_chunk_size(ctg) <- 200
chm <- rasterize_canopy(ctg_norm, res = 0.3, p2r(subcircle = 0.3), pkg = "terra", overwrite=TRUE)

fill.na <- function(x, i=5) { if (is.na(x)[i]) { return(mean(x, na.rm = TRUE)) } else { return(x[i]) }}
w <- matrix(1, 3, 3)
filled <- terra::focal(chm, w, fun = fill.na)
smoothed <- terra::focal(chm, w, fun = mean, na.rm = TRUE)
terra::writeRaster(smoothed, outputCHMPath, overwrite=TRUE)

Next step is to test tree Crown extraction based on this CHM. This is probably straight foward now.

Jean-Romain commented 2 months ago

For noise there is classify_noise

ErwannHouzay commented 2 months ago

A few comment:

For 20 files processing went perfectly so I guess we were not using correctly LidR. Still if ou have some recommandation about memory optimization, I'm interessted!

Jean-Romain commented 2 months ago

Edit: this is my initial answer before to actually try, see my next answer after for details on the processing and parameters.

lidR is an R package and is sometimes limited by how R itself behaves. It uses a lot of memory by design to load the points into an R data.frame, which is the best data structure for exposing data to R users but a poor structure for spatial data from a pure programming standpoint.

For example, your 30 GB file is likely to be a 60 to 80 GB object once loaded in R. This is not an issue with lidR but a limitation of R itself, which has only two data types (32-bit int and 64-bit float). Add some internal partial copies because an R data.frame is not an efficient data structure for processing a point cloud, and because R is prone to creating copies of its objects by design, you end up with 120 GB of memory usage easily. How R allocates and releases memory is complex, and developers have no control over it.

lidR is strongly optimized for an R package (notably, it recycles memory whenever possible) but remains an R package that respects how R works. This is the price of creating an R package that behaves like all the other R packages in the R ecosystem. You may have reached some limits of lidR. As I said, lidR was not designed as a production tool but rather as R&D software for R programmers to try new algorithms, experiment with new ideas, and develop new concepts within the R environment. Dealing with very large amounts of spatial data within R is not simple and is not only a problem with lidR.

That being said, does it mean it is not possible? I don't know; I need to test it myself actually. But lidR may indeed not be the best tool for such a challenge. Again, this is why I designed lasR. lasR is standalone C++ software, and the R part is just the API call, so lasR does not have to deal with how R works. In lasR, I have full control over memory layout, allocation, and deallocation. In lasR, a 30 GB file loaded into memory takes 30 GB of memory because it is not written in R but in C++. only. The drawback is that you can't "play" with your datasets in a data.frame using the 10,000 available and compatible R packages. lasR is more powerful because it is not really a standard R package but less versatile because it not really a standard R package. A matter of trade off.

I have never encountered a 30 GB file. If you are willing to share it, I'll give it a try with both lidR and lasR, and maybe other software, to see if other software can handle it better. If something can be optimized from the user standpoint, I'll tell you; if something can be optimized internally, I'll do it. I'll try to process it with my 32 GB of RAM memory.

Jean-Romain commented 2 months ago

So I gave it a try on your 10 GB file.

Pre-test with lidR

I previously suggested that the CSF would produce a colossal number of ground points. Consequently, the triangulation of the ground points would produce a colossal number of triangles and consume a huge amount of memory. I checked it, and I was correct. The CSF is producing 300 ground points/m², which is absurdly high regardless of your end use.

library(lidR)
ctg = readLAScatalog("/home/jr/Téléchargements/CHM_Issue.las")
plot(ctg)
las = clip_rectangle(ctg, 594000, 9707000, 594100, 9707100)

th = 0.2
mycsf <- csf(sloop_smooth = FALSE, class_threshold = th, cloth_resolution = 0.2, rigidness = 2)
las  <- classify_ground(las, mycsf)

gnd = filter_ground(las)
plot(gnd)
gnd
#> class        : LAS (v1.2 format 2)
#> memory       : 136 Mb 
#> extent       : 594000, 594100, 9707000, 9707100 (xmin, xmax, ymin, ymax)
#> coord. ref.  : NA 
#> area         : 10000 units²
#> points       : 2.97 million points
#> density      : 297.15 points/units²
#> density      : 297.15 pulses/units²

I suggest setting class_threshold to 1 cm (0.01) or even 1 mm (0.001). These settings produce 45 and 5 ground points/m², respectively. 5 ground points/m² is already a lot in many LiDAR datasets, but let's try with 45. 45 ground points/m² is huge. You are producing a 0.3 m DTM, i.e., ~10 pixels/m² from a 45 points/m² dataset. Under these conditions, it would be better to sample the lowest points per pixel rather than performing a costly triangulation and interpolation.

Processing with lasR

Anyway, I did it with class_threshold = 0.01. Using lasR, I was able to process your 10 GB point cloud using a maximum of ~10 GB of memory (single core). The lasR triangulation is more performant and memory efficient than in lidR. I did not try with lidR. The overall process (classification with CSF, triangulation of ground points, interpolation of the triangulation for normalization + DTM) took 7 minutes on one core.

code source with lasR ```r library(lasR) gnd = classify_with_csf(slope_smooth = FALSE, class_threshold = 0.01, cloth_resolution = 0.2, rigidness = 2) tri = triangulate(filter = "-keep_class 2") dtm = rasterize(0.3, tri) norm = transform_with(dtm) write = write_las(ofile = "/home/jr/Téléchargements/*_normalized.las") pipeline = reader_las() + gnd + tri + dtm + norm + write ti = Sys.time() ans = exec(pipeline, on = "~/Téléchargements/CHM_Issue.las", with = list(chunk = 200, progress = TRUE)) tf = Sys.time() difftime(tf, ti) ``` Notice that I have some avenues for some improvements but I need time to develop them.

Processing with lidR

You can give it a try with lidR if you want. It should work slower and use more memory but you have a big machine so at the end with 45 ground points/m² I don't see why it would fail.

Alternative strategy

Considering the massive amount of data and the super high density of ground points, I suggest using another processing strategy. The typical triangulation + interpolation process is standard because it is usually applied to datasets with 0.5 to 5 ground points/m². In your case, considering the colossal amount of data, I think you should use another strategy that does not involve an unecessary spatial interpolation. That would be much faster and more efficient. Intuitively I'd say we could optimize to process the point cloud in 3 minutes using less than 2 GB of memory.

Other software

What about other important software on the market? PDAL? PDAL has the CSF. We are both using the same original author's code. For triangulation we are both using the same library as well. But, as far as I know, PDAL is not capable of processing multi-files dataset collections and is not able to split a single file into multiple chunks like lidR or lasR. Consequently I predict (not tested) that PDAL would not have been able to complete this task on my computer. LAStools? LAStools has another algorithm for ground classification and is thus hardly comparable strictly. LAStools was able to perform the ground classification in 9 minutes using 10 GB of memory, producing 50 ground points/m². Then, the normalization took another 21 minutes and 5 GB of memory. I did not run the DTM.

As you can see lidR and lasR are definitively playing well on the market when using an adequate strategy and adequate parameters. And for lasR I can easily create custom stages tailored to your needs on demand.

Jean-Romain commented 2 months ago

I also tried cloud compare. Laoding the point cloud in the software took me 22 GB. I was to close to my RAM limit be able to process it with this software.