r-lidar / lidR

Airborne LiDAR data manipulation and visualisation for forestry application
https://CRAN.R-project.org/package=lidR
GNU General Public License v3.0
596 stars 131 forks source link

point density (all returns) vs. point density (last only) #501

Closed wiesehahn closed 2 years ago

wiesehahn commented 2 years ago

The functions summary() and print() return a density value based on all points. In many cases the interest is instead on the pulse density / point density (last only).

I think the term "point density" is used synonym for both depending on the user. To my knowledge the USGS for example is referring to last only returns as "point density".

That's why I was confused in the beginning when I read the high point density value.

I would suggest to display both, similar to e.g. lastools: point density (all returns) and point density (last only)

Jean-Romain commented 2 years ago

Your are doing the distinction between point density and pulse density. I understand your request that makes sense. However it is more complex that you may think. Computing the point density is already a challenge because

  1. Computing the area of set that has mathematically an area of 0 is somehow subjective. See also this SE question
  2. Because it must be performed extremely fast. You don't want to wait 1 second to print(las).

I recently changed the way I'm computing the point density to copy LAStools after the above mentioned question. However the LAStools method is weak when the density of the point cloud is low or when there is no CRS recorded or even if the CRS is somewhat unconventional. And it is wrong if the CRS is longlat. I sightly improved the LAStools method by introducing alternative computation in some cases.

That is only for the point density. Now the pulse density: a las file does not record "pulses". Pulses must be retrieved either from the gpstime or from ReturnNumber and NumberOfReturn. It is not difficult in the general case but what if:

  1. There is no gpstime in the file
  2. The gpstime is not loaded because user used the select argument
  3. The gpstime is not populated (0 everywhere)
  4. The gpstime is incorrectly populated
  5. ReturnNumber and NumberOfReturn are not populated (e.g. terrestrial lidar)
  6. ReturnNumber and NumberOfReturn are incorrectly populated
  7. ReturnNumber and NumberOfReturn are not loaded because user used the select argument

LAStools does not have to handle cases 2 and 7 but I'm wondering how it deals with other cases. I'm pretty sure it returns invalid outputs. But in R the "tradition" is to be very very helpful and informative with users by handling every possible trouble nicely. In this case there is many cases to handle for a very little value added.

So I keep the PR open if one day I find some motivation at doing 95% of error handling work just to print a number. Meanwhile you can do it the simple way (without error handling):

# Assuming gpstime is valid
area <- area(las)
npulse <- data.table::uniqueN(las$gpstime)
npulse/area

# Assuming return number and number of returns are valid
area <- area(las)
npulse <- sum(las$ReturnNumber == las$NumberOfReturns)
npulse/area

Notice that in version 3.x.y area is the convex hull area. In version 4 it is the occupancy grid area (similar to lastools) if the point cloud is dense enough and have a valid CRS. Otherwise it falls back to other estimates

Jean-Romain commented 2 years ago

It now prints pulse density using first returns. It uses informatio from the header so it also works for LAScatalog

> las
#> class        : LAS (v1.2 format 1)
#> memory       : 6.2 Mb 
#> extent       : 684766.4, 684993.3, 5017773, 5018007 (xmin, xmax, ymin, ymax)
#> coord. ref.  : NAD83 / UTM zone 17N 
#> area         : 51572 m²
#> points       : 81.6 thousand points
#> density      : 1.58 points/m²
#> density      : 1.08 pulses/m²