Write extra byte - Githubissues

floriandeboissieu commented 6 years ago

Here is a first version to write extra bytes, with associated testthat (tests/test-writelas.R). All LAS 1.4 capacities are respected (10 data_types available), except for the number of dimensions limited to 1 as LAStools put this limitation.

To make it work I added ExtraBytes à C++ writelas function. As DataFrame default value (DataFrame(0)) is not understood by compiler when in function arguments, this argument had to be mandatory and located before other arguments.

There seem to be some issues with valgrind check...

Jean-Romain commented 6 years ago

Thank you so much I will study that. I read through the code very quickly. I will probably make several change but we have a good start.

For example what about this signature:

writelas = function(file, header, X, Y, Z, etc, ...)

Instead of:

writelas = function(file, header, X, Y, Z, ExtraBytes, etc)

It enables to write for example

writelas(file, header, X, Y, Z, amplitude = myvector)

floriandeboissieu commented 6 years ago

Ok, as you wish although I see it more as a complication than a simplification, on a part that is actually not the direct interface to user (that is more lidR::writeLAS) but that is just an opinion. I will PR the lidR part tonight (I was waiting to have a test file it s almost done). On this part, I am not very happy so you will have modifications on this one for sure. A major pb in my opinion is that the header is not updated by reference... thus the function add.extra_byte is not updating by reference neither. I didn't find an easy way to update a slot by reference although there are several articles on the net about it. Another point is to relate min/max values in the same scale as the values itself, i.e. unscaled values when in R, scaled and quantized when writing to file. I will change that so that this info can be used in summary for example.

So anyway, feel free to change as it please you.

Cheers

Florian

On 22/01/2018 18:47, Jean-Romain Roussel wrote:

Thank you so much I will study that. I read through the code very quickly. I will probably make several change but we have a good start.

For example what about this signature:

writelas = function(file,header,X,Y,Z,etc,...)

Instead of:

writelas = function(file,header,X,Y,Z,ExtraBytes,etc)

It enables to write for example

writelas(file,header,X,Y,Z,amplitude = myvector)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Jean-Romain/rlas/pull/12#issuecomment-359506678, or mute the thread https://github.com/notifications/unsubscribe-auth/AP54IdkyJS-Y4Qw357DUMEuxRwrvkiA9ks5tNMnNgaJpZM4RoMv4.

Jean-Romain commented 6 years ago

Ok, as you wish although I see it more as a complication than a simplification

I read the code and I do agree

A major pb in my opinion is that the header is not updated by reference... thus the function add.extra_byte is not updating by reference

It should be done at the C++ level. I already have a function update_list_byref that is not completely functional.

floriandeboissieu commented 6 years ago

Still have some compilation problem with rhub on macos. It comes from the modification I made on writeLAS with NumericVector EB[number_attributes] non-POD array with variable length... solution should be converting to std::vector<std::vector > instead, with some trick like in https://gist.github.com/floswald/ffa0e77370e03872f6ac or http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2011-November/003126.html I couldn't mak eit clear tonight so I let it to you... sorry

Jean-Romain commented 6 years ago

Ok, I will need a lot of time to review your PR. So do not expect a merge before a while. If I have a question I will contact you by email.

floriandeboissieu commented 6 years ago

Ok, no worries, I don't think I will have the time for the next 2 weeks to work on it, so it was just to tell you that it will be delayed. to next month :-).

Jean-Romain commented 6 years ago

Hi,

I read your code. I re-formated some part, changing the position of some lines of code and adding some comments.

I think I understood almost everything. There are still some parts that I understand (I mean this is easy to read the code and follow what happens) but I don't really know their purposes. Could please read the new code and fill up the comments like // ?????. This will help me to better understand the goal of each line.

Also I think we should reconsider the fact that a default description is added at the C++ level. In my opinion it would be better to create R function that help the user to build a proper header more or less automatically and refuse incomplete headers.

ps: do not consider the branch conflicts. I going to manage that myself.

Cheers

floriandeboissieu commented 6 years ago

Hi Jean-Romain, I hope your thesis presentation went well. You're right, it's time to come back to it, I'll look into it tomorrow morning. Cheers

Jean-Romain commented 6 years ago

Great. I'm making two functions. One exported make_header that automatically create a fair header from the data including the generation of proper VLR data. And one internal check_header that check if the header contains enough data and nothing is missing.

Then, at the C++ level, I will remove the guesses about VLR. At the C++ level the function is expected to receive valid data in the header. The C++ code will be more clear and this also will enable a better flexibility for the user.

Jean-Romain commented 6 years ago

Forget what I said about comment. I read the spec and now it makes more sense. I'm close to the end. The new writelas function will be much better than the old one.

floriandeboissieu commented 6 years ago

On 20/02/2018 22:01, Jean-Romain Roussel wrote:

Great. I'm making two functions. One exported |make_header| that automatically create a fair header from the data including the generation of proper VLR data. And one internal |check_header| that check if the header contains enough data and nothing is missing.

Then, at the C++ level, I will remove the guesses about VLR. At the C++ level the function is expected to receive valid data in the header. The C++ code will be more clear and this also will enable a better flexibility for the user.

Before you start the merge does not contains the last commits !!! (23 jan., see https://github.com/floriandeboissieu/rlas/commits/write_extra_byte), it does not work the way it is coded in your merge, sorry :

no_data, min, max should be stored in the same way as the core data, i.e. clamped, and quantized. If you do not do it like this values will be wrong, this is why you cannot group the types in the switch for has_min and has_max.
except setting the storing type, all that has to do with storing matters (scaling, quantizing and clamping) should be in the C++ and not in the R in my opinion, as it it should be independent of the usage. Morever, options tells you the storing type, however in R you only have type int and float. So clamping and quantize should be done all in the same place, in my opinion in the C++ code as in the last commits of the pull-request (23 jan.)
Also to keep in mind, as no_data, min and max give information on then NA value and the range of the data, I think that in the VLR slot the no_data, min and max of each attribute should inform on the unscaled data, i.e. not the one stored in the file that are scaled. See commit https://github.com/Jean-Romain/rlas/pull/12/commits/fc24437bb30ca320fa330032e904e8d7a95a0a36

please wait until tomorrow, we should chat about it before.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Jean-Romain/rlas/pull/12#issuecomment-367117938, or mute the thread https://github.com/notifications/unsubscribe-auth/AP54IfPKXqG5-NO-j9F2yRelYTKsJOvPks5tWzLBgaJpZM4RoMv4.

Jean-Romain commented 6 years ago

Morever, options tells you the storing type, however in R you only have type int and float.

Actually doublenot float but you're right. So in absence of header we can generate on and set the options to 7 or 9. With more code we can even test the range and choose a better type at the R level.

So clamping and quantize should be done all in the same place, in my opinion in the C++ code as in the last commits of the pull-request (23 jan.)

Yes. I do no plan to change anything.

please wait until tomorrow, we should chat about it before.

Sure. But I'm not changing how las are written. I'm only trying to make the function more flexible. Basically my plan is the following:

write.las accepts two parameters. A data.frame instead of X, Y, Z and so on. And a header.
If the data.frame contains the core data, each field is written in the las respecting the header provided. For example if the header states the point data format is 0 and R G B are provided, they are not written.
If the data.frame contains extra data that are not it the core data they are saved as extra byte only if the header states they are extra byte.
- The function make_header take care of making a fair enough header at the R level. It means that it detect the point data format and it detects the potential extra bytes and creates valid info about extra bytes
- The function check_header check internally if the header is in accordance with the data.
- Everything that can be computed and verified at the C++ level is done at the C++ level.

This a lot of work but it is only sugar. I do not change the way you write files. I'm expecting the following code to write valid las whatever the content of data.

data = data.frame(....)
header = make_header(data)
write.las(header, data)

Jean-Romain commented 6 years ago

OMG I changed hundred of lines of code without checking anything and it works without any bug! :muscle:

Give me few more days (I must add more control condition to check wrong inputs at the R level) and I will push that in the write_extrabyte branch. You will be able to review that to find some potential bug.

floriandeboissieu commented 6 years ago

Alright, your plan fits exactly what I had in mind before I stopped, I am reassured :-) Making it so fast was impressive!

Jean-Romain commented 6 years ago

Well I have still some work to check if everything is correct and exhaustive (I wrote 300 lines of very boring code) but it is fully functional an it preserves everything you did. You will be able to review the code starting from testthat (this is easier in my opinion to start by the test).

R function writelas is deprecated for write.las. No more ridiculous number of input. Just file, header and data
C++ function laswriter is deprecated for C_writer. No more ridiculous number of input too. It has been rewritten from your code removing few redundant code and adding a flexible support of any data input.
New R function make_header capable of generating a fair header from data (e.g. correct point format, correct extra byte description)
New R internal function check_header called in write.las that ensure the header contains enough information and the C++ code cannot fail
New R internal function check_data called in write.las that check if the data contains proper type (e.g. X Y Z are double)
New R internal function check_header_data called in write.las that check if the header is in accordance with the data (e.g. extra bytes description do not describe a attribute that is not in the data, point format is in accordance with the data)
I updated testthat

The branch write_extrabyte has deep conflict with master but it does not matter. Do not consider other branches.

To do:

A function update_header that selectively update the header preserving the data that do not need to be updated
C++ code can be improve. I think there is still possibility to factorize the code and to remove useless code (e.g. are attribute_index and attribute_start really mandatory?)
Write mooore tests.

I'm not going to work more on this feature until Monday.

floriandeboissieu commented 6 years ago

you merged before my last commits of the PR (23 jan.) solving lots of things, please have a look : 958dcd7 , fc24437,
71e3a42

If necessary I can do the merge this W.E. and make a PR on this branch :-) I'll also look into the code this W.E.

Cheers

Jean-Romain commented 6 years ago

Yes, please add the missing commits. I don't want to do it myself because I'm afraid of missing something. It is not easy to cherry-pick manually some lines of code without error.

Also do not consider writelas. writelas is still in the package for compatibility with old versions of lidR and users's codes. But it does not matter if extra byte are not supported. What matter is write.las.

floriandeboissieu commented 6 years ago

Ok

floriandeboissieu commented 6 years ago

I'm on it this morning

floriandeboissieu commented 6 years ago

Hi Jean-Romain, merge conflicts solved, as well as non-POD error on macos rhub check.

About general functioning of write.las, and the make_header function in particular, as it is at the moment, it is not very useful. In my opinion there should be two behaviours:

the user (or lidR) update the header with extra byte parameters (scale, offset, no_data value, data type with which it should be stored)--> the header is corresponding well to data --> all is working well
the user does not make or does not want to bother with the header --> the header should be updated automatically with default values.
1. To be straight forward (or transparent for the user), make_header should be called in in write.las if check is not good.
2. Arguments like scale, offset, no_data value, type should be available in make_header, in case the user wants to specify the values.

Want is your opinion about that?

floriandeboissieu commented 6 years ago

By the way, an interesting article that I should have read earlier when writing/correcting the merge conflicts: http://gallery.rcpp.org/articles/rcpp-wrap-and-recurse/

At the moment I am converting all extra byte to NumericVector. I think with integer variable extra byte the way it is now (laswriter/C_writer) did not made conversion error, but i am not so sure of my memory and should integrated into tests... if not working in such a case it might be useful to use the trick to test whether the extra byte variable is integer or double.

Jean-Romain commented 6 years ago

Want is your opinion about that?

I do agree with everything :smile:

Jean-Romain commented 6 years ago

Well actually not with everything. make_header is useful. If you don't have a header at all it allows the user to get a header which is already filled. Then the user can selectively modify some part without creating everything from scratch.

r-lidar / rlas

Write extra byte #12