r-lidar / lidR

Airborne LiDAR data manipulation and visualisation for forestry application
https://CRAN.R-project.org/package=lidR
GNU General Public License v3.0
582 stars 130 forks source link

.my_std_metrics works with Z but not with Intensity #704

Closed ouroukhai closed 1 year ago

ouroukhai commented 1 year ago

Hello,

I have a normalized catalog (nctg) and I want to apply .my_std_metrics function:

.my_std_metrics <- function(x)
{
  # keep name of variable (str_remove for forward compatibility)
  .name <- stringr::str_remove( deparse( substitute(x) ), ".list[[i]]" ); .name 

  # stats
  stats <- purrr::discard(DescTools::Desc(x)[[1]], is.character); stats
  remove_names <- c("x","freq","small","large", "mode","modefreq_crit","quant","label") ; stats
  stats <- stats[!(names(stats) %in% remove_names)]; 
  names(stats) <- paste0(.name, names(stats)); stats

  # quantiles
  q_seq <- seq(0, 1, by = 0.05)
  quantiles <- sapply(q_seq, function(q) quantile(x, q)); quantiles
  names(quantiles) <- paste0(.name, "q", q_seq * 100)

  # percentages
  perc <- list(pabove2 = pabove(x, .low = 2),
               p515 = pabove(x, .low = 5, .upper = 15),
               pbet1525 = pabove(x, .low = 15, .upper = 25))
  names(perc) <- paste0(.name,names(perc))

  ans <- c(perc, stats, quantiles)

  and
}

using the grid_metrics(). The following line works:

opt_filter(nctg) <- "-keep_z 0 35 && -keep_class 5"
Z_metr <- grid_metrics(nctg, ~.my_std_metrics(Z), res = 20)

but this one does NOT:

I_metr <- grid_metrics(nctg, ~.my_std_metrics(Intensity), res = 20)
#> Processing [======>------------------------------------------------------------------]  10% (1/10) eta: 16s
#> Error: All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.

What troubles me the most is that if I extract a file:

las <- readLAS(nctg$filename[1], filter = "-keep_class 5 && -keep_z 0 35")

then BOTH .my_std_metrics(las$Z) and .my_std_metrics(las$Intensity) work but

I_metr <- grid_metrics(las, ~.my_std_metrics(Intensity), res = 20)
#> Error in `[.data.table`(las@data, , if (!anyNA(.BY)) .my_std_metrics(Intensity),  : 
#>  All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.

again FAILS.

Trying without the filter though works:

las <- readLAS(nctg$filename[1])

If the filter is applied, I can verify that there are empty pixels (or grids).

My assumption: Applying grid_metrics on empty pixels produces the error. What is the suggested way to fortify .my_std_metrics so that even on empty pixels it works under grid_metrics()?

Edit: Printing fine tuning. Edit2: trying on the whole catalog without filter again fails

opt_filter(nctg) <- ""
I_metr <- grid_metrics(nctg, ~.my_std_metrics(Intensity), res = 20)
#> Processing [======>------------------------------------------------------------------]  10% (1/10) eta:  1m
#> Error: All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.
Jean-Romain commented 1 year ago

Hard to tell without a reproducible example but you must search for edge cases not for empty pixels. Empty pixels do not exist and your code is never called with 0 length data.

You assumed that some variables are e.g. a list but in some limit cases they may not be a list. According to the error it suggest that, for at least one pixel, some items may be vectors. An edge case is a limit case where the data are distributed in such a way that your variables are exceptionally of a different type than expected. For example a function that return a list a 3 numbers in the general case but if you input only 0s it returns NULL. Or maybe a pixel with a single point and a function that return NULL in this edge case.

The quantiles part looks ok even if you should better write it this way:

q_seq <- seq(0, 1, by = 0.05)
quantiles <- quantile(x, q_seq)
quantiles <- as.list(quantiles)

So I guess you should look at pabove() or DescTools::Desc(x) that may inconsistantly return different objects.

ouroukhai commented 1 year ago

Thank you for the fast reply. Finally, I compromised on writing each element one by one. I guess that Desc(x) returned dubious values in the extreme cases you mentioned.

 .my_std_metrics <- function(x){

  # stats
  stats <- list(n = length(x),
                NAs = sum(is.na(x)),
                unique = dplyr::n_distinct(x),
                zeros = sum(x == 0, na.rm = T),
                mean = mean(x, na.rm = T),
                meanSE = sd(x, na.rm = T)/sqrt(length(x)),
                min = min(x, na.rm = T),
                max = max(x, na.rm = T),
                sd = sd(x, na.rm = T),
                vcoef = mean(x, na.rm = T)/sd(x, na.rm = T),
                mad = sum(abs(x - median(x, na.rm = T)), na.rm = T)/length(x),
                IQR = unname(quantile(x, 0.75) - quantile(x, 0.25)),
                skew = skew(x),
                kurt = kurtosis(x))

 # quantiles
  q_seq <- seq(0, 1, by = 0.05)
  quantiles <- quantile(x, q_seq)
  quantiles <- as.list(quantiles)
  names(quantiles) <- paste0("q", names(quantiles))

  # percentages
  perc <- list(pabove2 = pabove(x, .low = 2),
               p515 = pabove(x, .low = 5, .upper = 15),
               pbet1525 = pabove(x, .low = 15, .upper = 25))

  ans <- c(perc, stats, quantiles)

  ans
}

For completeness, pabove is this function:

# function to measure percentages
pabove <- function(x, .low = 0, .upper = Inf){
  sum(dplyr::between(x,.low,.upper), na.rm = T)/sum(x > 0, na.rm = T)
} 

Edits: Print fixes.