r-lidar / rlas

R package to read and write las and laz files used to store LiDAR data
https://cran.r-project.org/package=rlas
GNU General Public License v3.0
34 stars 14 forks source link

Classification flags are missing #20

Closed spono closed 6 years ago

spono commented 6 years ago

Ciao JR, I tried to import some data pre-processed with LAStools but I have some issues:

Am i missing something or is there any plan to enhance this functionality?

all the best

Jean-Romain commented 6 years ago

What is the info you are trying to read in the VLR? This is true that all the VLRs are not supported yet.

VLR extra bytes attributes are supported since rlas 1.1.8 or maybe 1.1.9. Extra bytes attributes are extra informations store along each point. But I guess this is not this kind of information that you are looking for, otherwise VLR should not be empty.

So please tell me more. I can't guess what is withheld.

spono commented 6 years ago

I'll try to explain (even if i have to admit that Extra Bytes are for me still something not completely clear). the 'withheld flag' is explained at page 11 of the ASPRS LAS specification, as a value written in bit 7 for values that are present but that should be avoided (when specified).

In this case, the VLR that I want to read is related to the buffer info (real extent and buffered points) that are written from LAStile when tiling; this allows to easily drop the buffer ones.

I don't know if all these info are stored in the EB, but in the meantime I'll try 1.2.0/1.2.1 the soonest.

Jean-Romain commented 6 years ago

I got it.

This is definitively not supported in rlas yet. rlas currently read the value of the classification not this kind of metadata that comes with the classification (to be honnest I missed this table in the spec). Also notice that this flag is not related to VLR.

Regarding VLR and buffer info, all the VLR are not supported yet. I don't know what kind of data is stored by lastile. Is it documented somewhere?

spono commented 6 years ago

ah, ok. BTW, here the discussion with Martin Isemburg on the issue https://groups.google.com/forum/#!msg/lastools/MQi7Ios5JMg/HR5bOY1bBgAJ;context-place=forum/lastools] and you can find the documentation in the first link

Jean-Romain commented 6 years ago

I can support the flags that come along with the classification. I can do it maybe before the end of the next week in a memory suboptimized way. However the point related to the VLR will not be supported.

Jean-Romain commented 6 years ago

Well, @spono your question raised too many questions of optimization. I will explain everything here. This is an open discussion. @floriandeboissieu your opinion is more than welcome.

Context

The classification information in a las file is built upon 8 bits 00110010. The 5 firsts bit represent the classification number and the 3 last ones represent 3 flags. So the previous number must be read like that: 00110 | 0 | 1 | 0 which mean 6, false, true, false. This is what the page 11 of the spec tells us.

Currently only the 5 bits are read and the flags are skipped. I can read the flag. There is no difficulty at all. Within half an hour I can support that and the output of rlas will be a table with in addition to the regular columns, 3 new columns containing the 3 flags.

Problem

That being said, this is sooo unoptimized. The flags (and this is also true for the columns EdgeOfFlightline and ScanDirectionFlag) are stored on a single bit per point. We cannot stored this information in a single bit because in R there are only two types of number: 32 bits and 64 bits. Thus we have to use 32 bits per points to store the information instead of 1 bit per point. This will use 32 times more memory than actually needed to store these flags.

So if I use 4 columns to store the classification + the flags I will used 128 bits of memory per point instead of 8. Considering the inefficiency of R to manage the memory, the low level of knowledge about memory of many R users and the fact that object loaded are already too big in my opinion for the reason explained here I don't really want to expend too much the table read from a las file.

Solutions

I will close the discussion right now. There is no solution!! But we can discuss about more or less clever hacks.

The first one obviously is the one that already exists and which consists in using the select argument to load only the columns of interest. It implies to creates new letters to selectively load some flags. I don't like this options. Firstly, many letters are already used. adding new one will be confusion. Moreover if you use select = "c" you want the classification and the flags should come with the data. I don't think it is a good idea to selectively load some flags but not the classification or any other weird combination. In addition it does not solve the problem because by default everything is loaded.

The second option is to actually store the information on a single bit. This is possible within the bit package. The bit package use 32 bits integers to vehicle 32 boolean informations. This is a clever hack made compatible with regular R code thank to a robust underlying C that surcharge many R functions. With the package bit you can store 1000 binary values using only 32 numbers instead of 1000. I can use this trick and I'm able to create a bit object at the C++ level. So I can do it.

However this is not nice a solution. A bit object cannot be stored in a table. Thus, yes I'm able to get an optimized representation of the flags but I cannot have a compatibility with previous features. Indeed flags must be stored in a structure that is no a table. In other words in a different object than the regular data.

In lidR it mean you would not be able to do something like that

lasfilter(las, withheld == FALSE)

But maybe something like

lasfilter(las, las@flags$withheld == FALSE)

I have to change both rlas and lidR for compatibility reasons. Also maybe some internal code in lidR may emulate the compatibility. Everything is actually possible if working hard enough...

Conclusion

We have two solutions.

  1. Store the flags in a deeply unoptimized way in term of memory. In that case everything will be immediately compatible with everything that currently exists. This is what is currently done for EdgeOfFlightline and ScanDirectionFlag. It works but I don't like it.
  2. Store the flag in a very clever way taking advantage of bit. In that case, flags will be some special data that can't come along with the table. This add some difficulties for the user, it must be documented and well documented both in rlas and lidR. I must think a lot about the best integration. This is not trivial at all and I don't like it.
floriandeboissieu commented 6 years ago

Hi Jean-Romain and Spono, Jean-Romain, I agree with all your statements and have nothing more to add. However, in my opinion, I think the first objective of a reader/writer would be to keep the integrity of all the data, that is not loosing anything on the way. To have access and make it easy to all the information come as second and third objective. The first solution seems ok to me for the moment. You would also have the intermediate solution to group all the flags in an integer and add a decoding function when used: lasfilter(las, lasflags(flags, "withheld") == FALSE) Cheers Florian

floriandeboissieu commented 6 years ago

or even make the computing internal to lasfilter interpreting the statement withheld == FALSE

floriandeboissieu commented 6 years ago

But the first solution seems the best in the short term.

spono commented 6 years ago

as @floriandeboissieu said, it would be good at least not to loose the information in the reading/writing process [if possible]. Optimizations are always possible but not always strictly needed.

Said this, unfortunately I can't join the discussion in a useful way: I definitely not have the technical background to help/judge...so I totally rely on your skills :)

Jean-Romain commented 6 years ago

The branch classification_flag has a ready to use version of the reader (no writer yet but I can probably do it today).

I chose the easy way. The goal of lidR is to be straightforward even at the cost of very unoptimized things. I did it in a similar way than what already exists. Thus it is 100% compatible with lidR. 3 new columns and 3 new flags in read.las: s for synthetic flag, k for key points, w for withheld. This is not documented in lidR but it works too in the version 1.5.0.

# everyting but synthetic flag and keypoint flag
lasdata <- read.las(lasfile, select = "* -s -k")

Currently the flags are loaded as boolean. This is not guarantee to be definitive. Names are maybe not definitive too.

I also plan to count the number of withheld and synthetic points while reading even if the user do not load these information. And I will print a summary (if required) about these fields for example if more than 0 withhelp points are loaded.

spono commented 6 years ago

Ciao JR, just tested the classification_flag branch + lidr 1.5 but I get the following error while loading the point cloud:

Error in .Call(_rlas_C_reader, ifiles, ofile, filter, i, r, n, d, e, : Incorrect number of arguments (19), expecting 16 for '_rlas_C_reader'

spono commented 6 years ago

using instead the classification_flag branch + lidr master, it works; when the flag -w is used, it loads only XYZ.

Jean-Romain commented 6 years ago

I close this issue. Flags are supported both in read and write mode in the branch master (v1.2.2)