Extrabyte attribute values not loaded correctly from catalog containing files with different number of extrabyte attributes

jmmonnet commented 8 months ago

Hi,

I have a catalog with files containing different numbers of Extrabytes attributes. H is an Extrabyte attribute common to all files, but its values are not loaded correctly when clipping from a catalog. Here is an example with two files, one has Reflectance, Deviation and H as EB attributes, the other only has H. The mean value of H is not the same when clipping files separately, and when clipping from a catalog it depends on the order in which files appear in the catalog.

The files are available at : https://filesender.renater.fr/?s=download&token=a05daa23-6232-4aab-bed7-90c79e4ad226

I have lidR 4.1.1 from CRAN and rlas 1.7.0 installed today from git

file1 <- "EB-LHD_FXX_0913_6571_PTS_O_LAMB93_IGN69.laz"
file2 <- "EB-LHD_FXX_0913_6572_PTS_C_LAMB93_IGN69.laz"
#
roi <- sf::st_bbox(terra::ext(c(912950, 914050, 6569950, 65710050)))
#
cata <- lidR::catalog(file1)
a1 <- lidR::clip_roi(cata, roi)
mean(a1$H)
nrow(a1)
#
cata <- lidR::catalog(file2)
a2 <- lidR::clip_roi(cata, roi)
mean(a2$H)
nrow(a2)
#
setdiff(names(a1), names(a2))
#
mean(c(a1$H, a2$H))
length(c(a1$H, a2$H))
#
cata <- lidR::catalog(c(file1, file2))
a12 <- lidR::clip_roi(cata, roi)
mean(a12$H)
nrow(a12)
#
# re-order files
cata <- lidR::catalog(c(file2, file1))
a21 <- lidR::clip_roi(cata, roi)
mean(a21$H)
nrow(a21)

Jean-Romain commented 8 months ago

I see the problem. Sadly, there is nothing I can do. This is how LASlib works, and I'm not going to rewrite this part that is deep inside LASlib.

Let me try to explain what is happening:

You are trying to read and merge two files that have different types, and especially here, two different numbers of extrabytes, which is even more problematic. When doing it, you have a warning for a good reason. The names of the extrabytes do not exist in a LAS file; extrabytes attributes are just a set of extra bytes at the end of each point. They are read in order. The first attribute corresponds, for example, to the first two bytes, the second attribute to the next byte, and so on. Their position matters, not their name. The name is added later in rlas using the description in the metadata to assign names to the data.frame. This is not an inherent property of the attribute.

In your case, you are trying to read two files as one. The first file is the reference and initializes the format with three extrabytes attributes (0-"Reflectance", 1-"Deviation", 2-"H"). Then you read the second file with a single attribute (0-"H") that is added in attribute 0 of the current format, i.e., into the Reflectance. There is no way to know that their names do not match since there is no concept of a name at this point.

The same kind of stuff happens if you try to read and merge two files with and without intensity. Depending on which file is the first file, you may produce a file without intensity or get the intensity partially zeroed. The difference here is that the name "intensity" kind of exists in the sense that we always know to what the current bytes we are reading correspond for core attributes.

Overall, you should not work with incompatible files. LASlib is well-designed and handles many cases, but this one is impossible or at least very hard to handle efficiently in the general case.

You will have the same problem with LAStools, and I would bet that there is no reader on the market capable of doing what you want to do.

What you can do it to read the two files independently. Remove unwanted extrabytes and merge at R level with rbind

jmmonnet commented 8 months ago

Thank you very much for the explanations. I thought I could store the height above ground as extrabyte attribute, but it is definitely a risky option in case different file formats might exist in a dataset. Reading files independently is indeed possible but then I would lose the benefits of the catalog engine.

The french national lidarHD data is provided by different contractors, by square "blocs" which are 50 km wide. Unfortunately the LAZ file format is not the same in all "blocs". Some have extrabyte attributes, some do not. In order to normalize heights of files within a given "bloc", I need to load files from the neighbor blocs to avoid border effects. Is there a way to make sure that the point cloud data (including extrabytes attributes) of the normalized tile is preserved, in case some border tiles have a different format ? I guess I could use a two-step procedure:

computes DTMs by loading only XYZC attributes
normalize the Z values by file using the DTM vrt

Otherwise I wonder if I could loop for files in the catalog and do the following steps:

put file i in the first row of the catalog so that it is used as reference
set attribute processed to TRUE for this file and FALSE for others
use normalize_height

Jean-Romain commented 8 months ago

Honestly, I'm not sure myself. In your case I would have expected a crash in the worst case or an error. We have a somehow corrupted result with a warning.

What I can say is that, when processing by file, the central file is guaranteed to be the main file to handle these special cases. The neighbor files are read and merged with the main file, thus preserving the format of the main file. Thus, in theory, the normalization should work, preserving the extrabytes. However, what won't work is to manipulate the extrabytes because, as you can see, the values are not the ones expected. In the case of normalization, the buffer is removed in the written files, so the potentially corrupted values will be discarded. In theory, it should work as expected.

jmmonnet commented 8 months ago

Thank you for the information. It is good news that normalization preserves the extrabytes attributes when processing by-file. For further processing of height values, I will fall back to the old strategy of saving additional files with height in the Z attribute, as extrabyte values are not guaranteed to be loaded correctly for all files in case of different formats.

r-lidar / lidR

Extrabyte attribute values not loaded correctly from catalog containing files with different number of extrabyte attributes #748