function LAD() does not work with gridmetrics()

ptompalski commented 8 years ago

when running gridmetrics with function LAD() an error occurs:

LASfile <- system.file("extdata", "Megaplot.las", package="lidR")
lidar <- LoadLidar(LASfile)

#this works:
gridMetrics(lidar,20,vci(Z,zmax = 20,by = 1))

#this works:
LAD(lidar@data$Z,dz = 1,k = 0.5)

#this does not work:
gridMetrics(lidar,20,LAD(z = Z,dz = 1,k = 0.5))
#Error in hist.default(z, breaks = bk, plot = F) : 
#invalid number of breaks

points to LAD.r#65

Jean-Romain commented 8 years ago

LAD is not a metric. LAD is a profile of leaf area density as described in Bouvier et al. In other terms LAD return a vector not a number. The metric associated with LAD in Bouvier et al. is the coefficient of variation of this profile.

So it's not a bug, it's a feature. A code which works is the following one

LADCV = function(z)
{
    lad = LAD(z)
    return(sd(lad)/mean(lad))
 }

gridMetrics(lidar,20,LADCV(Z))

ptompalski commented 8 years ago

ahhhh sorry for posting before thinking.

but... I think the example section for gridmetrics could include the example above, which I am happy to add.

Jean-Romain commented 8 years ago

I agree with you, my documentation is not completely exhaustive and this example could be added. Basically I spent two days writing the documentation in one shot and I never read it after that :-)

Jean-Romain commented 8 years ago

Erratum: the previous code doesn't work because lad is a data.frame... But the main idea is correct...

ptompalski commented 8 years ago

ok, I tried to run the example and still got the same error. What am I doing wrong?

LADCV <- function(z) {
  lad <- LAD(z)[,2]
  return(sd(lad)/mean(lad))
}
gridMetrics(lidar,20,LADCV(Z))

Jean-Romain commented 8 years ago

That is a real bug.

This is the bug you've got in the first post. I read too quickly your message because you miss used the LAD function and then a bug was expected...

Did you filtered the points in the lake ? If no the problem come from that. In the lake the Z data are 0 0 0 0 ... It is impossible to build the profile in these plots because R try to build an histogram from 0 to 0. I must add a condition to avoid the error and return NA

I always remove the points in the lake so I forgot this case.

Jean-Romain commented 8 years ago

Let me clarify the problem with LAD. LAD is a function which is very hard to manipulate. I don't like this function.

When you use gridMetrics(), the algorithm makes the cells and send, for each cell, the lidar data required by the function (i.e. Z in this case). It only send the values to the function but it doesn't make any control.

When you process a large set of cells you can be sure that you will have at least one weird cell. For example a cell which fall into a lake and for which each Z values are equal to 0. For the mean height it's not a problem it returns 0. But for LAD it's a problem because there are 0 layer. So I updated the code to return NA_real_ when there are less than 3 layers. That's ok.

But the LAD function have other problems. If there are no point on layer 1 and 2 for example (i.e. that 0 point reach the ground nor the first layer) it returns NA for these layer by definition of the gap fraction given by Bouvier. The computation of the gap fraction is no possible so the function returns NA for these layers. To compute the mean and the sd we must remove the NAs.

This function cannot crash (I hope). But it's tricky, it require to perfectly understand the LAD function. I can add this function in the package but I made some choices in this function. My choice was to remove all the NAs. But maybe it doesn't make sense everywhere. For example if only one layer is not NA. This function will work but it doesn't have a meaning.

LADCV <- function(z) 
{
  lad <- LAD(z)

  if(class(lad) == "numeric")
    return(NA_real_)

  r = sd(lad$lad, na.rm=TRUE)/mean(lad$lad, na.rm=TRUE)

  return(r)
}

The metric described by Bouvier is interesting and it doesn't imply bug if you apply it on ground inventories with high pulse density. But if you apply it on the whole forest you will find a lot of special cases that must be analyse case by case. I mean lake, wetland, large gap, very very dense canopy, negative values etc.

ptompalski commented 8 years ago

I suspected this is the problem. I understand your concenrs, and I think the function should not be included in a package, but rather as an example in gridmetrics(). This seems to be something unique, but on the other hand, if there are users who'd like to create their own functions, this might be a nice way to demonstrate how to avoid errors during runs and that they should keep in mind that for some portions of the lidar dataset the metrics can be calculated with a very limited data.

ptompalski commented 8 years ago

Hi again,

I had one more thought on this. If the gridmetrics() function is sensitive for the return values of the used metric function, there should be additional mechanism inside that checks if the metric function is run properly. If not, NA value could be returned for those cells only, with a warning message for a user (maybe: "Warning. NA values produced for some cells"). This could be extended further. If a new user like me runs LAD() with gridmetrics() all values returned would be NA in this case. We could check if all cells contain NAs and return warning message ("NA values produced for all cells"). What do you think?

Jean-Romain commented 8 years ago

Not sure to understand what do you mean.

there should be additional mechanism inside that checks if the metric function is run properly

What kind of mechanism do you have in mind ? What do you mean by "metric run properly" ? It easy to say but most difficult to properly define. I have some ideas in mind since a while but without well defined problem it's difficult to choose a good solution. Furthermore there is a problem of time computation. The gridMetrics function I wrote is very fast (for a R code thanks to data.table) but the process of a complete dataset is long because there a lot a data and R is not dedicated to this kind of work. It's very easy to turn this function to a super slow function. That why I must be careful to too complex mechanism.

If a new user like me runs LAD() with gridmetrics() all values returned would be NA in this case. We could check if all cells contain NAs and return warning message ("NA values produced for all cells")

I could easly check if the function return a list of number or if the function return a more complex objects. This is one of the "mechanism" I have in mind. Easy and costless. But I'm not sure about what is the best way to do that properly. There are several possibilities. None of them can provide a 100% reliable test. At least it 100% reliable to prevent the user to miss-use LAD function :-)

ptompalski commented 8 years ago

What I had in mind, is that when gridmetrics() first divides the pointcloud into subsets (cells), you can check:

if the subset has more than, say, 3 points (if less, then function is not run, NA returned)
if the function run on this subset returned a single value (if not, NA returned)

Is this what you were thinking as well?

Jean-Romain commented 8 years ago

It's not my job. I'm not the one who must decide what is the behavior of the function if the number of points is lower than n. This is the user's choice. For example canopyModelis based on gridMetrics. In this case there is sometime only 1 or 2 points in the cells. It's not a problem.

That is why the question is more difficult than it appears. I cannot decide anything instead of the user. The user do whatever he want. The only test I can provide is to check if the function used by the user returns a list of numeric values and send and error if the function returns something else. And even that cannot be 100% reliable.

r-lidar / lidR

function LAD() does not work with gridmetrics() #2