r-lidar / lidR

Airborne LiDAR data manipulation and visualisation for forestry application
https://CRAN.R-project.org/package=lidR
GNU General Public License v3.0
601 stars 131 forks source link

treeID uniqueness options #519

Closed searoy closed 2 years ago

searoy commented 2 years ago

Using lidR::segment_trees(las, algorithm, attribute = "treeID", uniqueness = "bitmerge") creates unwieldy and weird IDs. Also, given a series of overlapping scans over time there is little assurance that the same treeID will be repeated on trees segmented using different date data.

  1. Add an option for "hexedecimal string" that converts bitmerge 64-bit binary treeID to hexadecimal and outputs it as a character string. It's still weird but it's not unwieldy.
  2. Add an option to specify (x, y) bitmerge 64-bit binary trailing zero bits, so creating a "bubble" where nearby repeat apex (x, y) extractions from different collect times are likely to merge and share a common treeID.
  3. Add an option to include defined variables into the defined attribute (treeID) as an attribute fill (or more efficient way I can't think of), for example off the top of my head, attfill = uniqueness, or even attfill = paste(jobnum, "UniverstySquare", collectdate, uniqueness, sep = '_')
Jean-Romain commented 2 years ago

creates unwieldy and weird IDs.

This is definitively true and it is documented why I made this choice that is definitively a sub-optimal choice but the best one I found (see below).

Also, given a series of overlapping scans over time there is little assurance that the same treeID will be repeated on trees segmented using different date data.

I'd rather say that there is virtually 0 chance to get matching ids

Add an option for "hexedecimal string" that converts bitmerge 64-bit binary treeID to hexadecimal and outputs it as a character string. It's still weird but it's not unwieldy.

This is not an option because LAS file format cannot store strings. Otherwise I would have provided an UUID options for example. The challenge is 1. dealing with R limitations, 2. dealing with LAS format limitations.

Add an option to specify (x, y) bitmerge 64-bit binary trailing zero bits, so creating a "bubble" where nearby repeat apex (x, y) extractions from different collect times are likely to merge and share a common treeID.

That seems to be reasonable idea but probably more complex to implement than it looks like at first glance. This requires reflection to ensure to do not unexpectedly merge trees

Add an option to include defined variables into the defined attribute (treeID) as an attribute fill (or more efficient way I can't think of), for example off the top of my head, attfill = uniqueness, or even attfill = paste(jobnum, "UniverstySquare", collectdate, uniqueness, sep = '_')

I'm not sure to understand that one.

searoy commented 2 years ago

Add an option to include defined variables into the defined attribute (treeID) as an attribute fill (or more efficient way I can't think of), for example off the top of my head, attfill = uniqueness, or even attfill = paste(jobnum, "UniverstySquare", collectdate, uniqueness, sep = '_')

I'm not sure to understand that one.

If las won't support strings, that's a stop. If spdf is the output, specify what fills the attribute field as a string, like you said UUID, rather than forcing default treeID as unique = "bitmerge" or "gpstime".

Jean-Romain commented 2 years ago

I thought about it and both strategies are working. Other representation can only be more more less convenient representations of the same ids and it is easy to make the modification.

Here ids are represented as hexadecimal strings.

las <- segment_trees(las, li2012(R = 3, speed_up = 5), uniqueness = "bitmerge")
las$treeID = sprintf("%A", las$treeID)
las$treeID[las$treeID == "NA"] = NA_character_
Jean-Romain commented 2 years ago

Here how to transform an unwieldy id into a nice geohash. This is the most elegant way to solve the problem of unique id but sadly it is a string

f = function(x,y,z) {
  i = which.max(z)
  return(list(X = x[i], Y = y[i]))
}

XY <- las@data[, f(X,Y,Z), by = treeID]
XY <- na.omit(XY)
XY <- sf::st_as_sf(XY, coords = c("X", "Y"), crs = sf::st_crs(las))
XY <- sf::st_transform(XY, 4326)
X <- sf::st_coordinates(XY)[,1]
Y <- sf::st_coordinates(XY)[,2]
XY$treeGeoHash <- geohashTools::gh_encode(Y, X, precision = 10)
hash <- sf::st_drop_geometry(XY)
data.table::setDT(hash)
las@data <- merge(las@data, hash, by = "treeID", all = TRUE)[, treeID := NULL][]