r-lidar / lidR

Airborne LiDAR data manipulation and visualisation for forestry application
https://CRAN.R-project.org/package=lidR
GNU General Public License v3.0
582 stars 130 forks source link

Filter Overlapped PointSourceID #755

Closed ouroukhai closed 4 months ago

ouroukhai commented 4 months ago

Hello lidR team, Thank you for the incredible package you provide!

I was searching recently about filter options regarding overlapping flight lines (or PointSourceID) and discovered this post:

https://gis.stackexchange.com/questions/272249/how-to-filter-point-cloud-by-flightline-in-r

Since this post is a bit outdated, grid_metrics() was used instead of pixel_metrics(), I was wondering if some function was further included to achieve this task.

All the best,

Jean-Romain commented 4 months ago

No, sorry. Nothing has been added to filter points specifically in the overlapping region.

ouroukhai commented 4 months ago

Hello again, I worked on the matter and modified the solution (cudos to Aaron and JRR) discussed in the link mentioned above:


domPSID.LAS <-  function(las, res = 20) {
  Rcpp::sourceCpp(code = '
                    #include <Rcpp.h>
                    using namespace Rcpp;

                    // [[Rcpp::export]]
                    NumericVector Mode(NumericVector x) {
                        if (is_true(all(is_na(x)))) {
                            // If all values are NA, return NA
                            return NumericVector::create(NA_REAL);
                        }

                        std::map<double, int> freq;
                        for (double num : x) {
                            if (!NumericVector::is_na(num)) {
                                freq[num]++;
                            }
                        }

                        int max_count = 0;
                        double mode = NA_REAL;  // Default to NA if no mode is found
                        for (const auto& kv : freq) {
                            if (kv.second > max_count) {
                                max_count = kv.second;
                                mode = kv.first;
                            }
                        }

                        return NumericVector::create(mode);
                    }
                    ')
  PSID <-
    pixel_metrics(las, ~  Mode(PointSourceID), res)  #obtain most represented flighline in segments of res
  las <-
    merge_spatial(las, PSID, "PSID")       #Add a new attribute to each point of the most represented PSID in the res segment where it is
  las <-
    filter_poi(las, PointSourceID == PSID)    #filter cloud
  las$PSID <- NULL
  return(las)
}

domPSID.ctg <-  function(chunk, res)
{
  las <-  readLAS(chunk)
  if (is.empty(las)) return(NULL)

  las <- domPSID.LAS(las, res)                        
  return(las)                                  
}

ctg_noolaps <- catalog_apply(ctg, FUN = domPSID.ctg, res = 20)

The code implements a Mode() function in C++ and then uses this function to calculate the mode over pixels of resolution res. For a $43km^2$ region, it did $\approx 5'$ (naive calculation by looking at the time) for res = 20, on a processor: 12th Gen Intel(R) Core(TM) i7-1265U 1.80 GHz, with 32 GB RAM using 4 cores in parallel.

My current issue is compiling the Mode() outside domPSID.LAS to reduce the run time. The results: image

Please let me know if you have any guidance on the matter. All the best,

EDIT: I did change the last part to:

option <- list(automerge = TRUE, need_buffer = TRUE)
ctg_noolaps <- catalog_apply(ctg, 
                              FUN = domPSID.ctg, 
                              res = 20, 
                              .options = option)

for better output (otherwise, a list with the file names will be returned).

EDIT2: Finally I worked with no buffer:

ctg@chunk_options$buffer <- 0
plan(multisession, workers = 4L)
option <- list(automerge = TRUE, need_buffer = F)
ctg_noolaps <- catalog_apply(ctg, 
                              FUN = domPSID.ctg, 
                              res = 20, 
                              .options = option)

since the result had some overlapped tiles.

Jean-Romain commented 4 months ago

Nothing special to say about it. Using pixel_metrics and mode to rasterize the most represented flightline is a simple approach that works. On my side I don't like the raster-based approach and want something more accurate/general for lidR. Typically the union of polygons or something like that but is very complex is the general case (more than 2 PSID overlapping, complex shape in the intersection and so on). This is why I never worked on it.

ctg_noolaps <- catalog_map(ctg, FUN = domPSID.ctg, res = 20)
ouroukhai commented 4 months ago

Is there a more efficient way to use the catalog engine by compiling the function Mode() only once?

Jean-Romain commented 4 months ago

Put your c++ code outside the function

ouroukhai commented 4 months ago

That was my initial code, but I got this error: NULL value passed as symbol address, and I assumed that it was occurring because the Mode() function was run in a different environment inside the catalog_apply and could not be recognized. When I changed to this, everything ran smoothly.

Jean-Romain commented 4 months ago

This is because you are processing in parallel. The C++ function cannot be exported to each individual worker. You have several options

  1. work on linux with a multicore plan
  2. create a package with your mode
  3. write mode in R
  4. find a package than have a mode function
ouroukhai commented 4 months ago

Thank you very much for your help!