Issue with gap_fraction_profile()

ouroukhai commented 9 months ago

Hello lidR team, And thank you once again for the incredible package you provide!

I was working with gap_fraction_profile() function and noticed these lines:

    bk <- seq(floor((min(z) - z0)/dz) * dz + z0, ceiling((max(z) - z0)/dz) * dz + z0, dz); bk
    histogram <- graphics::hist(z, breaks = bk)
    h <- histogram$mids
    p <- histogram$counts/sum(histogram$counts)
    p <- c(p, 0)
    cs <- cumsum(p)
    i <- data.table::shift(cs)/cs

I think the last line should be:

 i <- data.table::shift(cs)/(1-cs)

In the cited paper Bouvier et al. 2015, the gap fraction is calculated as follows:

$Pi = \frac{N{[0,z]}}{NT - N{[0,z + dz]}}$, for the i-th layer. My understanding is that cs calculates the empirical distribution function for the upper bound of the histogram clusters. By investigating further and calculating LAI for some randomly generated z:

z <- rnorm(300, 12,2); z

The initial algorithm's results seem way more reasonable. What do I miss here? Why does this seem different than the citation? Sincerely,

P.S. If you could also add a comment on why p <- c(p, 0) and not p <- c(0, p) it would be great :)

Jean-Romain commented 9 months ago

I'm not saying there is no bug in the code but... if your justification is the Bouvier et al. equation you must know that the equation is incorrect, it does not match with the text of the paper. Read the text, rewrite the proper equation based on the text and tell me if you still think that there is a bug.

ouroukhai commented 9 months ago

I will agree with you it seems incorrect. It is not even in [0,1), but I make no sense of the text too. Please leave this open a little bit more, I will return here if I have something more.

Jean-Romain commented 9 months ago

You do not understand the text, or you do not understand how my code map to the text ? It is a little tricky because it is a vectorized version. I do not remember the details of the paper or the details of the code myself btw. I did this about 10 years ago.

ouroukhai commented 9 months ago

To be honest, I do not understand either. In your code I do not understand why the ratios cs[i-1]/cs[i] are equal to the gap fraction for the height h[i-1]. In the text of the mentioned paper I understand nothing. Maybe it's time to get some rest for the day and continue tomorrow..

ouroukhai commented 9 months ago

Hello, I return to validate that the function of the package works as intended. For future reference, I will include my current understanding. The correct equation must be: $Pi = \frac{N{[0,z]}}{N_{[0,z + dz]}}$, (As it is indeed the way it is implemented in the package) where $[z,z+dz]$ denotes the i-th layer starting from $z0$ (i.e. the sets $[b{i-1},bi)$ (the sets with center $h{i-1}$, created by the variable bk) in the current implementation). The next argument can explain the aforementioned equation: Gap probability at layer-i can be viewed as the probability of a randomly thrown point (assume for simplicity from top to bottom) to the canopy reaching and passing layer i. So the estimated Gap probability (if we throw many points) is the fraction of the number of points that pass the i-th layer $(N_{[0,z]})$ to the number of points that reach the i-th layer $(N_{[0,z+dz]})$. Thank you again for your time. Please correct me if any of the above claims seem incorrect.

Jean-Romain commented 9 months ago

You are 100% correct !

And I'm glad to learn that we can write $\LaTeX$ equations on Github

r-lidar / lidR

Issue with gap_fraction_profile() #739