visr / LasIO.jl

Julia package for reading and writing the LAS lidar format.
Other
22 stars 13 forks source link

add scaled_boundingbox function #28

Closed Crghilardi closed 4 years ago

Crghilardi commented 4 years ago

this adds a scaled_boundingbox function as discussed here.

visr commented 4 years ago

I assume you would want this because you are comparing it to the scaled (raw) Int32 point coordinates as they appear in the LAS file, right?

To me it would make more sense to then also have this function return Int32 coordinates, or at least Int.

I see that in PointCloudRasterizers.jl this was not done, but perhaps we should. What do you you think @evetion?

evetion commented 4 years ago

Yep, I agree it should be Int32, as this is how it's stored in the las/laz file.

I still dislike the ambiguity of scaled. Should it be:

Well, we should document it carefully. Looking at PointcloudRasterizers, it can use some docs as well. Also had plans to generalize it, so it could work on MultiPoints as well. And what is going on with the indenting?

Crghilardi commented 4 years ago

OK, is there a smarter way to do that than wrapping everything in Int32()

scaled_boundingbox(h::LasHeader) = (xmin = Int32((h.x_min - h.x_offset) / h.x_scale), ymin = Int32((h.y_min - h.y_offset) / h.y_scale), zmin = Int32((h.z_min - h.z_offset) / h.z_scale), xmax = Int32((h.x_max - h.x_offset) / h.x_scale), ymax = Int32((h.y_max - h.y_offset) / h.y_scale), zmax = Int32((h.z_max - h.z_offset) / h.z_scale))

I originally grabbed the code and naming from utils.jl and thought it was a bit easier to read with the spaces. I'll remove the spaces when I know what to name it as well.

visr commented 4 years ago

Hmm I'm glad I'm not the only one being confused by scaled here. I do believe it is used correctly in this PR, e.g. the integers are scaled and the floats are unscaled.

I see the term scaled is used in the LAS 1.4 spec:

Max and Min X, Y, and Z The max and min data fields are the actual unscaled extents of the LAS point file data, specified in the coordinate system of the LAS data.

I went to laspy to see what language they use, but on https://laspy.readthedocs.io/en/latest/tut_background.html, the first blue box is actually interesting:

The various LAS specifications say that the Max and Min X Y Z fields store unscaled values, however LAS data in the wild appears not to follow this convention. Therefore, by default, laspy stores the scaled double precision values and updates header files accordingly on file close. This can be overridden by supplying one of several optional arguments to file.close(). First, you can simply not update the header at all, by specifying ignore_header_changes=True. Second, you can ask that laspy store the unscaled values explicitly, by specifying minmax_mode=”unscaled”.

I've ever only seen unscaled data in LAS headers. It appears to me the laspy authors also got confused and wrongly interpreted unscaled as the integers that are in the point structs.

So perhaps avoiding using scaled would not be a bad idea. Of the suggestions by @evetion I like stored_ the best, even though in the case of the boundingbox it is actually not stored... Or perhaps raw_? It should sound like you don't want it. In most cases, I think it may be best to just use Float64 coordinates everywhere, and avoid confusion. And if it turns out we can optimize by keeping the original stored Int32 intact, we probably want to use a special type similar to those in FixedPointNumbers or FixedPointDecimals, but with a Float64 scale and Float64 offset type parameter, rather than the underlying integers. I don't want to setup these packages in a way that for everything you can choose to work in scaled or unscaled way. That will just confuse everyone.

@Crghilardi not really a smarter way, but this is probably a bit more robust against floating point issues:

function scaled_boundingbox(h::LasHeader)
    (
        xmin = round(Int32, (h.x_min - h.x_offset) / h.x_scale),
        ymin = round(Int32, (h.y_min - h.y_offset) / h.y_scale),
        zmin = round(Int32, (h.z_min - h.z_offset) / h.z_scale),
        xmax = round(Int32, (h.x_max - h.x_offset) / h.x_scale),
        ymax = round(Int32, (h.y_max - h.y_offset) / h.y_scale),
        zmax = round(Int32, (h.z_max - h.z_offset) / h.z_scale)
    )
end
Crghilardi commented 4 years ago

:+1: interesting to read. I am just replicating what I see used in PointCloudRasterizers.

EDIT: OK, it makes more sense after dinner. I was thinking of one thing but saying the opposite. The function in the commit is "scaled" in the sense it takes the "real world" coordinates and scales them to the computer friendly Ints. I like stored_ naming as well.

Also, my experience is the same as yours, where I have only seen .las headers reported in "real world/ unscaled" units.

Does it make more sense to not have this function at all and re-work the index function to calculate with unscaled values? I don't have any opinion or insight, just asking to understand better. I may still be confusing things across the different packages.

visr commented 4 years ago

Ha ok, I asked for clarification on the use of scaled versus unscaled in https://github.com/ASPRSorg/LAS/issues/89.

Does it make more sense to not have this function at all and re-work the index function to calculate with unscaled values?

I think so. Algorithms can of course use stored integer coordinates internally if that makes a significant performance difference. But generally it's probably best to expose real coordinates to most users. The stored integers are an implementation detail of the LAS specification, not a general feature of point clouds.

Crghilardi commented 4 years ago

OK cool, that's a good idea to confirm it at the source! What (if anything) needs to be updated to move toward the "only expose real world coords to users" idea?

I see LasIO already has some convenience functions to convert one dimension of individual coordinates. It doesn't look like LazIO converts from integers to floats at any time currently?

I also saw your comment on the slack channel (copy pasted below):

Does anyone know of a package similar to FixedPointNumbers/FixedPointDecimals, but allowing an underlying integer to behave like a Real after application not only a float scaling factor, but also a float offset? Or would it be more efficient to just greedily convert to Float64 using muladd, rather than storing the scale and offset in the type parameter?

Are there any known issues with calculating on the float values compared to the integers at this point?

evetion commented 4 years ago

This functionality is in essence an AffineMap, which could be interesting for speed and readability?

using CoordinateTransformations
using LinearAlgebra

f = LinearMap(Diagonal([x_scale_factor, y_scale_factor, z_scale_factor])) ∘ Translation(x_offset, y_offset, z_offset)

unscaled_point = f(scaled_point)
scaled_point = inv(f)(unscaled_point)
visr commented 4 years ago

Yeah I think users should definitely have access to both the integer and float coordinates, but the floats will be less surprising to novice users and should perhaps be presented first. But those that know how the scaling works should be free to use them as well.

Reason I asked on Slack was because it would be great if we could get apparent real world coordinates with the same precision and footprint (Int32 is half of Float64). In the common case that the offset is 0 and scale is 0.001 (i.e. mm), we could already use FixedDecimal{Int32,2}(1.234). With such a type we could just reinterpret the data we already read.

The AffineMap indeed can also do the scaling, but does it for the whole point at once, similar to https://github.com/visr/LasIO.jl/blob/a8625c5ceb7c2189064793fd056c44d61515782d/src/point.jl#L212-L214

Are there any known issues with calculating on the float values compared to the integers at this point?

You have to be a little bit more careful about floating point accuracy. (and for integers, overflow). Also, some algorithms only work well on integer data, like radix sort. Some geometrical operations can also be faster if you know that your data effectively lies on a fixed grid.

Crghilardi commented 4 years ago

Yeah I think users should definitely have access to both the integer and float coordinates, but the floats will be less surprising to novice users and should perhaps be presented first. But those that know how the scaling works should be free to use them as well.

That sounds like a good plan. Where does that leave this PR and PointCloudRasterizers,index? Should I close this or is this still needed anywhere?

...it would be great if we could get apparent real world coordinates with the same precision and footprint (Int32 is half of Float64). In the common case that the offset is 0 and scale is 0.001 (i.e. mm), we could already use FixedDecimal{Int32,2}(1.234). With such a type we could just reinterpret the data we already read.

That does sound great! Do you have a sense of if that is possible yet?

visr commented 4 years ago

Where does that leave this PR

Since this may be a useful function in some cases, why not just merge it, but not export it? See 0e82f07509c2b6c4668cae37b110a03cfc82268a. It would be nicer to merge this in the boundingbox function with a keyword argument, but then you make it type unstable, not sure if that's worth it.

and PointCloudRasterizers.index

Not sure, haven't critically looked at it yet. But best to discuss in that repo if something needs to be done.

Do you have a sense of if that is possible yet?

Nope :) I mean it should be possible, but we'd need to do a bunch of benchmarking to see how it compares. Perhaps it's only viable when tagged on a special array type, such that you essentially have a struct with a Vector{Int32} and scale and offset.

Crghilardi commented 4 years ago

OK :+1: