how lidR interprets and implements the density metrics from Wood et al. 2008 - is it transferable to filtered lidar?

rs806 commented 3 years ago

I have a question about how lidR interprets and implements the density metrics from Wood et al. 2008. It’s more of a theoretical question to see if results are appropriate for filtered lidar.

• zpcumx: cumulative percentage of return in the xth layer according to Wood et al. 2008 (see metrics named d1, d2, ...)

Murray et al used all lidar hits, including ground. Other documentation suggests applying a threshold of say 2 metres to remove lower lidar hits. But the .stdmetrics_z code creates breaks based on zmax/10 so disregards the loss of data from filtering.

I’m thinking the breaks should be based on the range instead to accommodate filtering? On the other hand maybe zpcumx is more meaningful for all-hits…

thank you, Ray

Jean-Romain commented 3 years ago

Eleven canopy density metrics were derived. For each plot,the range of LiDAR height measurements was divided into 10 equal intervals and the cumulative proportion of LiDAR returns found in each interval, starting from the lowest inter-val (i.e., d1), was calculated. Since the last interval always sumto a cumulative probability of 1, it was excluded resulting in 9canopy density metrics (i.e., d1 ... d9)

The source code is

breaks <- seq(0, zmax, zmax/10)
d <- findInterval(z, breaks)
d <- fast_table(d, 10)
d <- d / sum(d)*100
d <- cumsum(d)[1:9]
d <- as.list(d)

Are you suggesting it should be

breaks <- seq(zmin, zmax, (zmax-zmin)/10)

rs806 commented 3 years ago

Yes, that's correct. breaks <- seq(zmin, zmax, (zmax-zmin)/10)

Currently if you drop the lower heights when reading in the lidar, the lower density metrics (like zpcum1) have the potential to contain no values or a reduced amount. Also the lower density metrics ranges cannot be compared from plot to plot anymore (when preparing for wall to wall modeling).

On the other hand the current way of determining the breaks may be important if vertical tree structure follows a set pattern no matter the height. But in this case the data shouldn't be filtered.

So not a coding issue; its a theoretical one for which I don't have an answer. However, my feeling is your suggestion of setting the range (zmax-zmin) is the safer choice. This could handle lidar data with no filtering, and data with filtering.

Jean-Romain commented 3 years ago

In the current interpretation the following two distributions are returning the same metrics that tell us that the points are evenly distributed from top to bottom. You interpretation tells us that the point are evenly distributed (but doesn't tell where)

In the current interpretation the following two distributions are returning different metrics that tell us that the point are distributed on top. You interpretation returns the same values for both cases and tells us that the points are evenly distributed (but doesn't tell where).

Moreover your interpretation gives the exact same values for each 4 cases while mine allows to discriminate 3 cases out of 4. Thus, to me, division in 10 layers from 0 to zmax is superior because it captures more information. Now your interpretation may be interesting as well and I understand your problem with height threshold filtering but I think you need custom metrics. For example if you have a height threshold at 2m

breaks <- seq(2, zmax, (zmax-2)/10)

rs806 commented 3 years ago

I might have incorrectly interpreted the definition of what the breaks are exactly or how they're calculated. My thought was to replace the 0 in the breaks equation with zmin to be accommodate inputted data filtered or non-filtered. Using the data to establish the minimum (either 0 or 2) will always result in 9 layers of equal interval. Hopefully I can explain my interpretation. The diagrams below (I already see that the divisions are mismatched) show that setting breaks to the minimum z (in my case 2 metres) will result in layers that we can still tell where the distribution is - in fact the layers are smaller so there will be a little more precision and we will avoid one layer that is incomplete (e.g. zpcum2).

plot A with all lidar hits (ground = 0 metres, maxVeg = 12 metres) minimum is set in the equation to 0 zmax <- 12 breaks <- seq(0, zmax, zmax/10) print(breaks) The result is: [1] 0.0 1.2 2.4 3.6 4.8 6.0 7.2 8.4 9.6 10.8 12.0

plot A with all lidar hits (ground = 0 metres, maxVeg = 12 metres) minimum is a variable and is set to 0 (i.e. minimum z value in data) zmin <- 0 breaks <- seq(zmin, zmax, (zmax-zmin)/10) print(breaks) The result is the same: [1] 0.0 1.2 2.4 3.6 4.8 6.0 7.2 8.4 9.6 10.8 12.0

plot A with all lidar hits (ground = 0 metres, maxVeg = 12 metres) minimum is a variable and is set to 2 (i.e. minimum data value in data) zmin <- 2 breaks <- seq(zmin, zmax, (zmax-zmin)/10) print(breaks) The result is: [1] 2 3 4 5 6 7 8 9 10 11 12

Jean-Romain commented 3 years ago

I think I do agree with you. What you want is

zpcum = function(z, zmin = 0)
{
  breaks <- seq(zmin, zmax, zmax/10)
  d <- findInterval(z, breaks)
  d <- fast_table(d, 10)
  d <- d / sum(d)*100
  d <- cumsum(d)[1:9]
  d <- as.list(d)
}

So it is just a matter of adding an optional argument in stdmetrics_z(). Are you ok to make a PR?

rs806 commented 3 years ago

That's great, thanks for looking at this. Do you mean a feature request? edit: okay, I see it's a Pull Request. I'll try.

Jean-Romain commented 3 years ago

Great, please update stdmetrics_z, ´stdmetrics, the documentation and ´NEWS.md. I'm not currently working and I'm a bit lazy to do it to be honest :smile:

r-lidar / lidR

how lidR interprets and implements the density metrics from Wood et al. 2008 - is it transferable to filtered lidar? #424