Extra bytes - Githubissues

floriandeboissieu commented 6 years ago

Dear Jean-Romain, as I needed it, I just added the EXTRA BYTE support for reading to rlas and lidR, data and header. All was already there in LASlib and LASzip. I also added a test in test_that in rlas package on a small dataset submitted as well. I let you check, and if it is OK wth you add a line in the news.md. Cheers, Florian

Jean-Romain commented 6 years ago

Whooo you're awsome.

I read the changes but to be honest there are to many changes I have to read the entire code in an IDE. I'm glad you did it because I was no really comfortable with the extra bytes in the spec. This is why it was no implemented yet.

I'm gonna look at that and I will come back with questions and comments. However there is at least one point to discuss. You enable to read the extra bytes so we must now enable to write them :wink:

Jean-Romain commented 6 years ago

I read the code. You did that very well and it is a great improvement :smile: . It is alway pleasant to review your PR. I have few questions and very minor issues to report. I can fix it or you can do it as you wish.

Questions:

Why in the code you chose at or At as a variable name to refer extra bytes. It is a question of readability. I can't find the logic.
Where in the spec did you read that there is a maximum of 10 extra fields ? I can't find this information
point.get_attribute_as_float(0). Is the data necessarily a float? What about storage of char, double, int?

Design mistake

RLASstreamer::allocation() should not be a new function. The code should belong in initialize with the other allocations.
at option comes before t option. It should be after. It is more logic in my opinion. First the regular data, then the optional ones.

floriandeboissieu commented 6 years ago

Hi Jean-Romain,

happy to here you liked it :-D.

I'll make the modifications this W.E. I'll look for the writer by the way (did not have the time this week).

About allocation, did you noticed that you are reserving (thus allocating) memory for vectors at initialisation of the streamer, before updating the variable selection options e.g. stream.read_rgb? Is there a reason for that (speed, avoid memory leaks,...)? That is why I made allocation separatly in allocation function that is called after updating the variable selection options.

Cheers

Florian

On 16/11/2017 15:48, Jean-Romain Roussel wrote:

I read the code. You did that very well and it is a great improvement 😄 . It is alway pleasant to review your PR. I have few question and very minor issues to report. I can fix it or you can do it as you wish.

Questions:

Why in the code you chose |at| or |At| as a variable name to refer |extra bytes|. It is a question of readability. I can't find the logic.

In LAStools it is called attribute (cf las2txt.cpp and lasattributer.cpp

Where in the spec did you read that there is a maximum of 10 extra fields ? I can't find this information

Actually, you are right it is not limited to 10. However, it is limited here for 2 reasons:

to be called in the select argument of readLAS (actually in the same way as LAStools e.g. "txyz012",

as I implemented declaring the vectors as members of the class, it had to be limited. I wanted to keep the way you coded the other options. Apart from these reasons, I thought of making a dynamic list of attributes that would be defined , but did not find a way of doing it properly. Another solution would be a Rcpp::NumericMatrix but I thought it might be more contraining for memory... Actually, I think a good way would be to define it in RLASstreamer::allocation. I'll try it this W.E.

|point.get_attribute_as_float(0)|. Is the data necessarily a float? What about storage of char, double, int?

You're write but I did not find an easy way to define it dynamically. However, with the idea above would solve it also. I'll try it this W.E. also.

Design mistake

|RLASstreamer::allocation()| should not be a new function. The code should belong in |initialize| with the other allocations.

Ok, but then the selection options should also be passed to the RLASstreamer::RLASstreamer , otherwise it would allocate memory to useless vectors if I read well the code.

|at| option comes before |t| option. It should be after. It is more logic in my opinion. First the regular data, then the optional ones.

Ok, as you wish. I'll make the modifications this W.E.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Jean-Romain/rlas/pull/6#issuecomment-344944827, or mute the thread https://github.com/notifications/unsubscribe-auth/AP54IS16EGqqPsExJrCsrtVuYhB7VlqVks5s3EtUgaJpZM4Qf1Yq.

Jean-Romain commented 6 years ago

About allocation, did you noticed that you are reserving (thus allocating) memory for vectors at initialisation of the streamer, before updating the variable selection options e.g. stream.read_rgb?

OMG. You're right ! Well what can answer to save face... Nothing... it is a big mistake from a good idea. Ok so, initialize should be a public function explicitly called.

Jean-Romain commented 6 years ago

Apart from these reasons, I thought of making a dynamic list of attributes that would be defined , but did not find a way of doing it properly

I do agree. What about a vector of int instead of a vector of bool. Then we can build a vector of vector with the size of the vector of int and read the corrrect extra byte using the vector of int.

Something like that

std::vector<int> extrabyte_numbers; // contains 0,2,5,24 for this example

std::vector< std::vector<double> > At(extrabyte_numbers.size());

for (i in 1:extrabyte_number.size())
{
  At[i].push_back(point.get_attribute_as_float(extrabyte_number[i]));
}

floriandeboissieu commented 6 years ago

Yes, that was what I was thinking, but then an attribute argument would have to be added to lidR::readLAS, otherwise with select string it would not be possible to know the difference between attribute 24 and arguments 2 and 4. I understand it is the reason of your comment on corresponding lidR patch. I see two solutions:

adding a numeric vector argument to lidR::readLAS with the list of attributes (same way as for rlas::readlasdata)
giving access only to the 10 first extra_bytes with select, and to all extra_bytes with *+

By the way, I tried to manage different attribute types (actually integer or double) with a template, but was not successful (It should be possible but my C++ is not good enough). An easy solution would be with boost library, but I suppose you didn't added it for a reason. If not, it could be a solution.

Cheers,

Florian

On 17/11/2017 20:54, Jean-Romain Roussel wrote:

Apart from these reasons, I thought of making a dynamic list of
attributes that would be defined , but did not find a way of doing
it properly
I do agree. What about a vector of int instead of a vector of bool. Then we can build a vector of vector with the size of the vector of int and read the corrrect extra byte using the vector of int.

Something like that

std::vector extrabyte_numbers;// contains 0,2,5,24 for this example

std::vector< std::vector >At(extrabyte_numbers.size());

for (i in extrabyte_numbers) { At[i].push_back(point.get_attribute_as_float(i) }

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Jean-Romain/rlas/pull/6#issuecomment-345349596, or mute the thread https://github.com/notifications/unsubscribe-auth/AP54IfdvuRXQMuxvvOnco3n6czWGjgrPks5s3eRwgaJpZM4Qf1Yq.

floriandeboissieu commented 6 years ago

I made the following changes:

at --> eb more explicit
At --> ExtraBytes more explicit
ExtraBytes is an std::vector< std::vector > allocated in function of eb. Thus size of eb is now unlimited.
eb is now one-based in R and zero based in C++
eb=0 means "load all extra bytes".
added unique filter on eb in C++ RLASstreamer::read_eb

I think it is a good trade off for the moment, one can access all the attribbutes as well as specifically the first ones.

Differentiation between extra byte types (keep integers when possible) is still to be developped.

On 17/11/2017 20:54, Jean-Romain Roussel wrote:

Apart from these reasons, I thought of making a dynamic list of
attributes that would be defined , but did not find a way of doing
it properly
I do agree. What about a vector of int instead of a vector of bool. Then we can build a vector of vector with the size of the vector of int and read the corrrect extra byte using the vector of int.

Something like that

std::vector extrabyte_numbers;// contains 0,2,5,24 for this example

std::vector< std::vector >At(extrabyte_numbers.size());

for (i in extrabyte_numbers) { At[i].push_back(point.get_attribute_as_float(i) }

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Jean-Romain/rlas/pull/6#issuecomment-345349596, or mute the thread https://github.com/notifications/unsubscribe-auth/AP54IfdvuRXQMuxvvOnco3n6czWGjgrPks5s3eRwgaJpZM4Qf1Yq.

Jean-Romain commented 6 years ago

but then an attribute argument would have to be added to lidR::readLAS,

I do agree

adding a numeric vector argument to lidR::readLAS with the list of attributes (same way as for rlas::readlasdata)

Agree too. Something like that readLAS(..., select = "xyz", at = c("width", "pulse")) but with an automatic selection with readLAS(..., select = "*").

By the way, I tried to manage different attribute types (actually integer or double) with a template, but was not successful (It should be possible but my C++ is not good enough).

I can maybe do it. But the question is how to know the type of the data? Is is recorded somewhere?

An easy solution would be with boost library, but I suppose you didn't added it for a reason.

Because I didn't know boost when I wrote this code. I know boost now but I'm still not familiar with it.

I'm gonna read your code. Anyway I do agree with you for the moment we must find a trade-off. If we do not enable everything it does not matter. It is much better anyway.

Jean-Romain commented 6 years ago

Accepted and merged into devel. It works fine and sounds compatible with lidR without changing anything (I ran only few tests).

However I will wait a bit before to merge into master. I'm sure there are some improvements to make and some bugs. It must be more extensively tested.

The data type is contained in the header. It is therefore possible to allocated the good type. I will work on that.

To write the extra bytes however it is much more difficult. It implies to create a good header at the R level and transform it into a lasheader. Create the header is a job for lidR. rlas only write what the user tell it to write but do not take any decision. However I must find how to create a proper lasheaderfrom a list for a LAS 1.4.

Jean-Romain commented 6 years ago

Also I changed default eb = 0

floriandeboissieu commented 6 years ago

On 21/11/2017 14:05, Jean-Romain Roussel wrote:

but then an attribute argument would have to be added to
lidR::readLAS,
I do agree
adding a numeric vector argument to lidR::readLAS with the list of
attributes (same way as for rlas::readlasdata)
Agree too. Something like that |readLAS(..., select = "xyz", at = c("width", "pulse"))| but with an automatic selection with |readLAS(..., select = "*")|.

Ok, I agree. For the moment I activated the select = "0" as "all extra bytes".
By the way, I tried to manage different attribute types (actually
integer or double) with a template, but was not successful (It
should be possible but my C++ is not good enough).
I can maybe do it. But the question is how to know the type of the data? Is is recorded somewhere?

Check print_attribute in LAStools source file las2txt.cpp.

header->attributes[index].data_type gives the type in which it is stored in LAS file
header->attributes[index].has_scale() and header->attributes[index].has_offset() tells you if after rescaling it will stay an integer or become a double.

An easy solution would be with boost library, but I suppose you
didn't added it for a reason.
Because I didn't know |boost| when I wrote this code. I know |boost| now but I'm still not familiar with it.

I'm gonna read your code. Anyway I do agree with you for the moment we must find a trade-off. If we do not enable everything it does not matter. It is much better anyway.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Jean-Romain/rlas/pull/6#issuecomment-346020656, or mute the thread https://github.com/notifications/unsubscribe-auth/AP54IZj2db-u7xDbxjAf1e1S6LmLnSEpks5s4sqhgaJpZM4Qf1Yq.

floriandeboissieu commented 6 years ago

On 21/11/2017 14:05, Jean-Romain Roussel wrote:

but then an attribute argument would have to be added to
lidR::readLAS,
I do agree
adding a numeric vector argument to lidR::readLAS with the list of
attributes (same way as for rlas::readlasdata)
Agree too. Something like that |readLAS(..., select = "xyz", at = c("width", "pulse"))| but with an automatic selection with |readLAS(..., select = "*")|.
By the way, I tried to manage different attribute types (actually
integer or double) with a template, but was not successful (It
should be possible but my C++ is not good enough).
done with a numbering argument eb inside the ... argument (the reason why I renamed odlparam to aargsin, i.e. additionnal input arguments). By names it would need further development.

I can maybe do it. But the question is how to know the type of the data? Is is recorded somewhere?
An easy solution would be with boost library, but I suppose you
didn't added it for a reason.
Because I didn't know |boost| when I wrote this code. I know |boost| now but I'm still not familiar with it.

I'm gonna read your code. Anyway I do agree with you for the moment we must find a trade-off. If we do not enable everything it does not matter. It is much better anyway.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Jean-Romain/rlas/pull/6#issuecomment-346020656, or mute the thread https://github.com/notifications/unsubscribe-auth/AP54IZj2db-u7xDbxjAf1e1S6LmLnSEpks5s4sqhgaJpZM4Qf1Yq.

Jean-Romain commented 6 years ago

What do you think about boost::any ? Sounds to be a good solution. But as I said I'm not familiar with boost yet.

Jean-Romain commented 6 years ago

I solved the problem without boost. I currently support 32 bits int and 64 bit double types but we can easily extend to char into CharacterVector.

You can look how I did. Nothing difficult actually. I just created ExtraByte32 and ExtraByte64 variables. Thank to weak typing in R this is not too complicate. I was not able to solve the problem with boost even with the help of stackoverflow community.

r-lidar / rlas

Extra bytes #6