r-lidar / lidR

Airborne LiDAR data manipulation and visualisation for forestry application
https://CRAN.R-project.org/package=lidR
GNU General Public License v3.0
590 stars 132 forks source link

About modifying the coordinates of LAS objects #372

Closed Lenostatos closed 4 years ago

Lenostatos commented 4 years ago

Dear lidR developers,

I was recently trying to shift the coordinates of a LAS object (i.e. subtract 33e+9 from the X coordinates) and encountered some difficulties. I think I've managed to do it in the end and I hope I have understood everything correctly after reading some lidR code, the LAS specification, and this explanation on stackexchange. Nonetheless, because it was not as straight forward I want to suggest three things that maybe could help others who encounter the same problem:

  1. Update the documentation of las_reoffset to be even more clear about the fact that the function doesn't change coordinates but just how the coordinates are stored. Rationale: When I was looking for a lidR function to change my coordinate values, I encountered the las_reoffset function and because I didn't know how the offset values of LAS files work, I was wondering why the function wouldn't do what I wanted, i.e. "offset" my coordinates. I could imagine that as long as one doesn't know about how the offset and scale values work in LAS files they would not expect that there is a function for changing the storage mode of points but not for changing the points themselves. At least this was the case for me. Maybe the documentation of las_reoffset and las_rescale could be extended by a note or warning that goes something like: "The las_reoffset function does not offset (i.e shift) coordinate values but modifies how the coordinates are stored in a LAS file on disk. Read the LAS specification and this answer (permalink: https://gis.stackexchange.com/questions/360247/rescaling-and-reoffsetting-a-point-cloud-with-lidr/360253#360253) on StackExchange for details on how this works."

  2. Add a section to the lidR book that explains how a user has to update the LAS header after changing coordinates, i.e. adjust scale and offset if necessary, update the bounding box in both the header and the bbox slot and anything else which might be necessary.

  3. Even more convenient than such a section in the book would of course be a function that updates the header of a LAS object after it's coordinates have been changed by the user. Maybe something like a mix of rlas::header_update and rlas::header_create where

    • the user is informed in the documentation and possibly warning messages about how a non-zero offset value shifts the range of allowed coordinate values to offset + c( -(2^32) / 2, 2^32 / 2 ) and about how a scale value different from 1 changes the 32 to something lower or higher and in any case makes it necessary to round the coordinates (I hope I have understood this part correctly)
    • if any coordinates are outside the allowed range, an error is raised
    • the user is informed about the degree to which their coordinates are rounded because of the scale value
    • the bounding box is updated in both the header (like in rlas::header_update) and the @bbox slot (like in the internal function lasupdateheader)
    • the scale and offset values are set to default values (like in rlas::header_create) or to values defined by the user

    One important question would be, how the user is allowed to modify the LAS object's coordinates. Should it be allowed to attach more points to the existing data? Should the user only be allowed to change the existing coordinates?

    I have patched together some pieces of code that might serve as a first inspiration for such a function but this is in no way meant as a working example!

    #' Update the header of a LAS object with modified coordinates
    #'
    #' Updates the header of a LAS object of which the user has modified the
    #' data.table at the \code{@data} slot.
    update_modified_coordinates <- function(modified_las,
                                        xscale = 0.01,
                                        yscale = 0.01,
                                        zscale = 0.01,
                                        xoffset = 0,
                                        yoffset = 0,
                                        zoffset = 0) {
    # assert_is_LAS(modified_las) ?
    assert_is_a_number(xoffset)
    assert_is_a_number(yoffset)
    assert_is_a_number(zoffset)
    assert_is_a_number(xscale)
    assert_is_a_number(yscale)
    assert_is_a_number(zscale)
    # Maybe some more assertions are needed?
    
    # Here one would try to convert the modified coordinates to integers and back
    # together with the given scale and offset like e.g. in lidR::las_reoffset
    # and if that succeeds, inform the user about how much their coordinates are
    # rounded with the scale value.
    
    # Get the number of points and coordinate extent
    if (nrow(modified_las@data) > 0L) {
    npts = nrow(modified_las@data)
    minx = min(modified_las@data$X)
    miny = min(modified_las@data$Y)
    minz = min(modified_las@data$Z)
    maxx = max(modified_las@data$X)
    maxy = max(modified_las@data$Y)
    maxz = max(modified_las@data$Z)
    }
    else {
    npts = 0L
    minx = 0
    miny = 0
    minz = 0
    maxx = 0
    maxy = 0
    maxz = 0
    }
    
    # Update the header
    #
    # I'm guessing that this section needs to be extended a bit. Maybe something
    # about the returns has to be checked/updated?
    header <- as.list(modified_las@header)
    
    if ("ReturnNumber" %in% names(modified_las@data)) {
    n <- if (header[["Version Minor"]] < 4) 5L else 15L
    header[["Number of points by return"]] <- tabulate(data$ReturnNumber, n)
    }
    header[["Number of point records"]] = npts
    header[["Min X"]] = minx
    header[["Min Y"]] = miny
    header[["Min Z"]] = minz
    header[["Max X"]] = maxx
    header[["Max Y"]] = maxy
    header[["Max Z"]] = maxz
    header[["X offset"]] = xoffset
    header[["Y offset"]] = yoffset
    header[["Z offset"]] = zoffset
    header[["X scale factor"]] = xscale
    header[["Y scale factor"]] = yscale
    header[["Z scale factor"]] = zscale
    
    modified_las@header <- LASheader(header)
    
    # Update the bbox slot
    modified_las@bbox[1,1] <- modified_las@header@PHB[["Min X"]]
    modified_las@bbox[1,2] <- modified_las@header@PHB[["Max X"]]
    modified_las@bbox[2,1] <- modified_las@header@PHB[["Min Y"]]
    modified_las@bbox[2,2] <- modified_las@header@PHB[["Max Y"]]
    
    return(modified_las)
    }

Thank you for your consideration! Leon

Jean-Romain commented 4 years ago

I understand your concerns. Most of your questions/suggestions were actually already addressed in version 3.1 (branch devel)

Update the documentation of las_reoffset to be even more clear about the fact that the function doesn't change coordinates but just how the coordinates are stored.

In version 3.1 the function is now documented in las_utilities. The documentation changed and looks like (check out the full doc with examples):

LAS utilities

Tools to manipulate LAS object maintaining compliance with
\href{http://www.asprs.org/wp-content/uploads/2019/07/LAS_1_4_r15.pdf}{ASPRS specification}

In the specification of the LAS format the coordinates are expected to be given
with a precision e.g. 0.01 for a millimeter precision (or millifeet) meaning
that a files records 123.46 not 123.45678. [...]

Even more convenient than such a section in the book would of course be a function that updates the header of a LAS object after it's coordinates have been changed by the user. Maybe something like a mix of rlas::header_update and rlas::header_create where

In 3.1 I exported internal functions named las_update() las_quantize(). With these new functions you have all you need to create valid LAS objects.

the user is informed in the documentation and possibly warning messages about how a non-zero offset value shifts the range of allowed coordinate values to offset + c( -(2^32) / 2, 2^32 / 2 ) and about how a scale value different from 1 changes the 32 to something lower or higher and in any case makes it necessary to round the coordinates (I hope I have understood this part correctly)

No sure to understand what you meant here.

if any coordinates are outside the allowed range, an error is raised

las_check() already performs many tests. It should catch if you did something wrong with the coordinates. But I don't think it checks the range indeed. If you can give me a particular example of something not caught by las_check(), please report and I will improve it.

the user is informed about the degree to which their coordinates are rounded because of the scale value

The function LAS() emits a message when building a LAS object from foreign data. Please give me a specific example of what you are expecting.

the bounding box is updated in both the header (like in rlas::header_update) and the @bbox slot (like in the internal function lasupdateheader)

lasupdateheader() has been exported in 3.1 with the name las_update(). The documentation is still coarse. If it is unclear please suggest improvements.

the scale and offset values are set to default values (like in rlas::header_create) or to values defined by the user

The scales and offsets are read from the file. When building a LAS object with LAS() using foreign data default scales and offsets are set but you can provide a header to force your owns values. This is documented. Again if it is not clear in the doc I'm opened to suggestion for improvements.

One important question would be, how the user is allowed to modify the LAS object's coordinates. Should it be allowed to attach more points to the existing data? Should the user only be allowed to change the existing coordinates?

You are not supposed to do that . Or to be correct you are not supposed to do that if you don't have a good understanding of the LAS specification. But you can do it anyway. New new tools given in 3.1 simplify the task (see the documentation of las_utilities

las$X = las$X - xshift
las$Y = las$Y - yshift
las_quantize(las)
las = las_update(las)

Add a section to the lidR book that explains how a user has to update the LAS header after changing coordinates, i.e. adjust scale and offset if necessary, update the bounding box in both the header and the bbox slot and anything else which might be necessary.

I will consider that. Feel free to fork the book to propose something. Not sure to do that in a close future.

Hope that help. I may not have answered all your question. Feel free to ask more clarifications

Jean-Romain commented 4 years ago

I thought about it and I think it worth it to implement an on-the-fly quantization + header update when using las$X <-

Jean-Romain commented 4 years ago

I pushed 2 commits indevel (version 3.1).

Lenostatos commented 4 years ago

@Jean-Romain Thank you very much for your exhaustive answer and the new functionality!

I like the ideas of moving these functions to a common documentation page and overloading the coordinate assignment operator and rbind function. The examples in particular should help with modifying coordinates in the correct way.

Is the data.table reference assignment also covered by this?: las@data[, X := newx]

There are a few things that I noticed about the las_tools code:

Regarding the range check you mentioned: I haven't looked at las_check but I have created a function that calculates the range of valid values for a given scale and offset value and optionally gives a description and explanation of how the user can change scale and offset in order to change the valid value range. Maybe this function could serve as "dynamic documentation" since the printed explanations are customized for the given scale and offset value. Please try the code below and tell me what you think of it! (The comments about how many values are storable in 4 bytes are meant as general documentation. I'm sure you know about all that already.)

storable_coordinate_range <- function(scale, offset, verbose = FALSE) {

  assertthat::is.number(scale)
  assertthat::is.number(offset)

  # The specification formally allows negative scale values, right?
  assertthat::assert_that(!dplyr::near(scale, 0))
  if (scale < 0) warning(paste0(
    "Try to use positive scale values! Negative scale values are allowed but ",
    "don't have any benefit over positive ones while being a bit more ",
    "difficult to understand in terms of their effect."
  ))

  # LAS specification data type for coordinate values: signed long (4 bytes)
  #
  # The number of unique integers storable in 4 bytes without a sign is:
  # 2^(4 * 8) = 2^32 = 4294967296
  #
  # With signed integers one bit is used for the sign so the number of unique
  # integers storable in the remaining bits is:
  # 2^(4 * 8 - 1) = 2^31 = 2147483648
  #
  # This means that we can store the numbers from 0 to 2147483648 - 1 with both
  # a positive and a negative sign. The range of valid integer values is
  # therefore:
  # -2147483647 to 2147483647
  #
  # There are some standards that define one more negative value for that range
  # (e.g. C++ ISO/ANSI) but the LAS specification explicitly mentions the
  # C ISO/ANSO C99 standard which defines the range noted above.
  #
  # See also
  # https://en.wikipedia.org/wiki/Integer_(computer_science)#Long_integer

  storable_min <- -2147483647 * scale + offset
  storable_max <-  2147483647 * scale + offset

  if (verbose) {
    num_fully_usable_decimal_digits <- -ceiling(log10(abs(scale)))

    formatted_min <- format(storable_min,
      justify = "none",
      scientific = -2,
      digits = 3,
      nsmall = 2,
      big.mark = ",",
      drop0trailing = TRUE
    )
    formatted_max <- format(storable_max,
      justify = "none",
      scientific = -2,
      digits = 3,
      nsmall = 2,
      big.mark = ",",
      drop0trailing = TRUE
    )
    formatted_width <- format(storable_max - storable_min,
      justify = "none",
      scientific = TRUE,
      digits = 1,
      big.mark = ",",
      drop0trailing = TRUE
    )

    cat(glue::glue("
      The range of storable coordinates is ", crayon::green(
      "[{formatted_min}, {formatted_max}] (width ~ {formatted_width}).
      "),
      crayon::italic(
      "  Move this range to higher or lower values by increasing or decreasing
      the offset respectively.
      "),
      "
      The current scale value({scale}) allows for ",
      crayon::green(
      if (num_fully_usable_decimal_digits > 0) {
      "coordinate values with {num_fully_usable_decimal_digits} accurately
      stored digits after the decimal point.
      "
      } else if (num_fully_usable_decimal_digits < 0) {
      "coordinate values that are accurately
      stored from the leftmost digit down to \\
      {-num_fully_usable_decimal_digits} digits before the decimal point.
      "
      } else {
      "accurately stored integer coordinates.
      "
      }),
      crayon::italic(
      "  Get x more storable digits on the right by decreasing the scale value
      by x orders of magnitude. However, this will also shrink the range of
      storable coordinates by moving the range's upper and lower end x orders of
      magnitude closer to the range center. Increase the scale value for the
      reverse effect.
      "),
      "
      When storing coordinate values or using certain algorithms that compute on
      integers, ",
      crayon::bold$red(
        "any digits beyond the precision mentioned above might be lost!"
      ),
      "

      This is because the coordinate values are converted to integers by
      processing them with the formula ",
      crayon::italic("(coordinates - offset) / scale"),
      "
      and rounding the resulting decimal numbers to integers!

      "
    ))
    # In the code of las_reoffset I have seen that coordinates are truncated
    # by calling as.integer(). Could it perhaps be better to round them
    # instead?
  }

  return(c("min" = storable_min, "max" = storable_max))
}

test <- storable_coordinate_range(scale =    1, offset =    0, verbose = TRUE)
test <- storable_coordinate_range(scale = 0.01, offset =    0, verbose = TRUE)
test <- storable_coordinate_range(scale =  100, offset =    0, verbose = TRUE)
test <- storable_coordinate_range(scale = 0.01, offset = 33e6, verbose = TRUE)

Thanks again for your consideration and effort! Leon

Jean-Romain commented 4 years ago

Is the data.table reference assignment also covered by this?: las@data[, X := newx]

No, there is no way to fully forbid manual modification. Operator $ can be redefined but I cannot redefined every single way to access the data especially := is controlled by data.table. But you can quantize by reference because las_quantize() does the job by reference (see the doc)

las@data[, X := newx]
las@data[, Y := newy]
las_quantize(las)

The first sentence in the third paragraph of the documentation states that a scale value of 0.01 corresponds to millimeters but wouldn't it rather be centimeters?

yes

the documentation of the ... parameter states that it is unused but as far as I can tell it is used in the is.quantized function.

yes but only for internal usage. This is a non documented feature.

The term "quantize" is not intuitive for me. On merriam-webster it is defined with the meaning "to subdivide (something, such as energy) into small but measurable increments" and I don't understand how that relates to the process of "rounding coordinates to integer convertability". However, as you can see from my description I'm also lacking a term that would be intuitive for me.

las coordinates are quantized. You don't have all the available value. From 0 to 1 with a scale factor of 0.01 you can only record 0, 0.01, 0.02, 0.03, ... 0.98, 0.99, 1. Coordinates are quantized. My reference is the wikipedia page

in las_reoffset you are using the "formula" as.integer((coordinates - offset) / scale) which is equivalent to trunc((coordinates - offset) / scale) according to the documentation of as.integer,

This is indeed an error I made but that I fixed later. I missed this one.

and in las_quantize you are using the same "formula" as in las_rescale (indirectly through quantize and fast_quantization) but this time written in C++ code.

Actually las_rescale and las_reoffset were written before fast_quantization. I should reorganize the code to use only one C++ code and do not duplicate code because it is more prone to bugs (and you found one)

I have created a function that calculates the range of valid values for a given scale and offset value and optionally gives a description and explanation of how the user can change scale and offset in order to change the valid value range. [...]

This is good. I'll start from your code to add some useful utility features

Lenostatos commented 4 years ago

Okay, cool. I'm happy that I could be of some help. Also, now I understand the term "quantization" :)

I won't close this issue myself because I'm not sure if you still want to add any commits for it, but I don't have anything to add here.

Jean-Romain commented 4 years ago

I added a test of range

library(lidR)

LASfile <- system.file("extdata", "MixedConifer.laz", package="lidR")
las = readLAS(LASfile)

z = las$Z
z[250] <- 123456789.01
las$Z <- z
#> Error: Trying to store values ranging in [0, 123456789.01] but storable range is [-21474836.47, 21474836.47]

las@data$Z <- z

las_check(las)
#> 
#>  Checking the data
#>   - Checking coordinates... ✓
#>   - Checking coordinates type... ✓
#>   - Checking coordinates range...
#>     ✗ Z coordinates range in [0, 123456789.01] but storable range is [-21474836.47, 21474836.47]
#>   - Checking coordinates quantization...
#>     ✗ 1 Z coordinates were not stored with a resolution compatible with scale factor 0.01 and offset 0
#>   - Checking attributes type... ✓
#> [...]

Created on 2020-09-28 by the reprex package (v0.3.0)