visr / LasIO.jl

Julia package for reading and writing the LAS lidar format.
Other
22 stars 13 forks source link

API for returning scaled and offset point cloud #34

Open Crghilardi opened 4 years ago

Crghilardi commented 4 years ago

I was messing around and ran into a situation where I wanted to get the scaled and offset float point cloud to use with NearestNeighbors.jl.

I can use the the convert function provided by the package, but then I have an Array{Point{3,Float64},1} and lose access to all the metadata and other convenience functions like classification().

using FileIO, LasIO
header, points = load("points.las")

using GeometryTypes
pc_scaled = map(x-> Base.convert(Point{3, Float64},x, header), points) #Array{Point{3,Float64},1}

The code above works fine for my toy problem, but I wanted to ask about design thoughts for adding this.

The ability to expose scaled and offset coords to users was discussed briefly in this LasIO issue.

After reading through src a bit, it looks like it would at least require a new type at least since all the x,y,z fields are ::Int? Not sure if any of the work over in GeometryBasics/GeometryTypes can be leveraged to simplify anything.

Maybe related issue in PointCloudRasterizers? and I also know about GeoAcceleratedArrays.

c42f commented 4 years ago

There's also some mention of incorporating scaling/offset into the type at https://github.com/visr/LasIO.jl/issues/4. Ideally I think the las positions should includes the affine transformation in their type. You might be able to get away with something as "simple" (ok, at least short) as the following:

using StaticArrays, CoordinateTransformations, LinearAlgebra

struct LasPoint{Trans} <: StaticVector{3,Float64}
    data::NTuple{3,Int32}
end

transform_of(p::LasPoint{Trans}) where Trans = Trans

function Base.getindex(p::LasPoint, i::Int)
    transform_of(p)(SVector(p.data))[i]
end

example usage:

julia> trans = AffineMap(Diagonal(SA_F64[1 0 0; 0 2 0; 0 0 3]), SA_F64[10, 20, 30])
AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0])

julia> p = LasPoint{trans}(1,2,3)
3-element LasPoint{AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0])} with indices SOneTo(3):
 11.0
 24.0
 39.0

The Julia compiler generates astonishingly good code with this version of getindex. For example, the first component of the diagonal scaling is 1 in this case, so the multiplication is optimized away!

julia> foo(p) = p[1]
foo (generic function with 1 method)

julia> @code_native debuginfo=:none foo(p)
    .text
    vcvtsi2sdl  (%rdi), %xmm0, %xmm0
    movabsq $140660719895864, %rax  # imm = 0x7FEE203E4538
    vaddsd  (%rax), %xmm0, %xmm0
    retq
    nopw    %cs:(%rax,%rax)
    nopl    (%rax)

To get the underlying Int32 components you always have reinterpret:

julia> points = LasPoint{trans}[(1,2,3), (4,5,6)]
2-element Array{LasPoint{AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0])},1}:
 [11.0, 24.0, 39.0]
 [14.0, 30.0, 48.0]

julia> reinterpret(NTuple{3,Int32}, points)
2-element reinterpret(Tuple{Int32,Int32,Int32}, ::Array{LasPoint{AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0])},1}):
 (1, 2, 3)
 (4, 5, 6)

Actually most of the above LasPoint is completely generic and in principle could be defined in CoordinateTransformations. The main difficulty for a fully generic version is the eltype of the point (it depends on both Trans and the eltype of the wrapped data in general and that would need to be explicitly written in the type signature which is slightly annoying.)

c42f commented 4 years ago

Thinking more about this, I suspect we don't need to go inventing new things because the pieces already exist in CoordinateTransformations and MappedArrays:

julia> points_int32 = SVector{3,Int32}[(1,2,3), (4,5,6)]
2-element Array{SArray{Tuple{3},Int32,1,3},1}:
 [1, 2, 3]
 [4, 5, 6]

julia> trans = AffineMap(Diagonal(SA_F64[1 0 0; 0 2 0; 0 0 3]), SA_F64[10, 20, 30])
AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0])

julia> points = mappedarray(trans, points_int32)
2-element mappedarray(AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0]), ::Array{SArray{Tuple{3},Int32,1,3},1}) with eltype SArray{Tuple{3},Float64,1,3}:
 [0.0, 0.0, 0.0]
 [14.0, 30.0, 48.0]

MappedArrays is just what you need: you can even assign into the array of fixed-point coordinates by supplying an inverse transformation:

julia> points = mappedarray(trans, (x->round.(Int32,x)) ∘ inv(trans), points_int32)
2-element mappedarray(AffineMap([1.0 0.0 0.0; 0.0 2.0 0.0; 0.0 0.0 3.0], [10.0, 20.0, 30.0]), Base.var"#64#65"{var"#9#10",AffineMap{Diagonal{Float64,SArray{Tuple{3},Float64,1,3}},SArray{Tuple{3},Float64,1,3}}}(var"#9#10"(), AffineMap([1.0 0.0 0.0; 0.0 0.5 0.0; 0.0 0.0 0.3333333333333333], [-10.0, -10.0, -10.0])), ::Array{SArray{Tuple{3},Int32,1,3},1}) with eltype SArray{Tuple{3},Float64,1,3}:
 [11.0, 24.0, 39.0]
 [14.0, 30.0, 48.0]

julia> points[1] = SA_F64[0,0,0]
3-element SArray{Tuple{3},Float64,1,3} with indices SOneTo(3):
 0.0
 0.0
 0.0

julia> points_int32
2-element Array{SArray{Tuple{3},Int32,1,3},1}:
 [-10, -10, -10]
 [4, 5, 6]
Crghilardi commented 4 years ago

That's pretty cool and seems to consistent in spirit with some of the other work using AffineMaps. All the work in JuliaGeometry seems to be using some form of StaticArray under the hood for point-like things so that should all play nicely with what you've outlined above right?

My original thought was to modify the load/open functions to add a flag to let users specify if they want the Int or Float version when reading rather than read and then transform.

The thing I am still thinking about is trickiness between versions of LAS spec versions. Some discussion here

Is a LasPoint be something special or is it just a point with some extra metadata fields(dependent on version)? Is there value in some generic supertype like PointClouds.jl?

EDIT: played around with it over the weekend, here is an example using the test data from LasIO

using FileIO, LasIO, MappedArrays, StaticArrays, LinearAlgebra, CoordinateTransformations, GeometryTypes

lasfn = joinpath(dirname(pathof(LasIO)), "..", "test/libLAS_1.2.las")
header, points = load(lasfn)

translation = AffineMap(Diagonal(SA_F64[header.x_scale 0 0; 0 header.y_scale 0; 0 0 header.z_scale]), SA_F64[header.x_offset, header.y_offset, header.z_offset])

extract(p::LasIO.LasPoint) = [p.x, p.y, p.z]
pts_int = [extract(i) for i in points]

ma_pts = mappedarray(translation,pts_int)

#test to make sure it matches convert(points)
pc_scaled = map(x-> Base.convert(Point{3, Float64},x, header), points)
ma_pts == pc_scaled # true
c42f commented 4 years ago

I believe with MappedArrays we can simply map RawLasPoint (fictional name) to LasPoint, and have LasPoint contain a SVector{3,Float64} in the correct geospatial coordinate system. I don't think any of the other fields will need to be mapped (?) Both RawLasPoint and LasPoint would share most metadata fields so those would be a simple copy and the compiler is quite good at optimizing away such copying code. We'd need to check efficiency.

So there's two rather different approaches to this stuff:

The MappedArray is more flexible, because it allows you to have more complex transformations which are not immutable structs. Sure you can get away with putting the transformation in the point type if it's a simple scale and offset. But sometimes this is not the case, for example

The MappedArray is also easier to implement. So I'd say that as long as it's efficient in practice it may be the better option to return from load by default.

I do believe that Images.jl got it completely correct when they used types like Array{ColorTypes.RGB{FixedPointNumbers.Normed{UInt8,8}} to represent image channel data semantically as Real numbers rather than integers. And we should do the same for geospatial data, even though the mechanism may be slightly different.

c42f commented 4 years ago

Is there value in some generic supertype like PointClouds.jl?

PointClouds.jl is quite abandoned now, but it's a good question.

For completely generic processing of point cloud data (not specific to LAS), I think presenting a tabular data structure of some form is the way to go. The various LAS formats are inherently an array-of-structs on disk, so it might be best to have a TypedTables.Table-like thing which overlays an internal row storage mechanism rather than the column storage used in TypedTables. Then have that implement the Tables.jl interface in an efficient way so that people can write generic processing code which works on both things like TypedTables.Table and on row-based LAS data.

Crghilardi commented 4 years ago

I have seen a few packages in the python world take the "points as a table" stance and seems to work well (geopandas also seems to be getting a reasonable amount of attention/love recently?).

Based on what you describe, at what point does it become a task of essentially re-creating RoamesGeometry/ Roames ecosystem?

There seems to be some convergence in terms of design ideas:

Also, thanks for engaging with me. It's nice to talk things out.

c42f commented 4 years ago

Based on what you describe, at what point does it become a task of essentially re-creating RoamesGeometry/ Roames ecosystem?

Interesting question though I'm not sure I know how to answer it.

I'd say that we didn't fully realized these ideas while I was still at Roames ­— by the stage we had enough experience to want the above design, we had a lot of ad-hoc processing code in C++ and other languages, plus systems/people dedicated to working on that. Julia point cloud infrastructure was a relative latecomer to this mix. So I'm not sure there's exactly anything to "recreate" here; pretty much all the generically applicable Roames tooling is openly available. The bits which work and are more maintained (StaticArrays, TypedTables, AcceleratedArrays, Displaz ......) can be used. The bits which are somewhat incomplete and unmaintained like PointClouds.jl never really got critical mass in the first place.

I haven't been closely followed developments in the geo/point cloud ecosystem for a while, so at this stage I'm not very qualified to know where the ecosystem is heading! But I do know that arrays of structs are a bit of a disappointing abstraction in general.

Crghilardi commented 4 years ago

The various LAS formats are inherently an array-of-structs on disk, so it might be best to have a TypedTables.Table-like thing which overlays an internal row storage mechanism rather than the column storage used in TypedTables

Would something like that be able to be added to TypedTables as it currently exists or would that be an entirely new API/package?

I clicked around for a few days looking for other rowwise projects and there doesn't seem to be much.

I was able to find RowTables that builds on normal DataFrames but it doesn't seem to be type stable.

I also saw a few mentions to IndexedTables as a way to get good performance with row-wise. It would be cool if we could load straight to JuliaDB and have all the out of core stuff work out of the box

c42f commented 4 years ago

Yeah I found RowTables.jl as well, but it doesn't look applicable. I also asked on slack, but to no avail so I think you'd have to build a new table type. Probably in a new package, though TypedTables could possibly be a home for it if there's really a lot of commonality.

evetion commented 4 years ago

I'm a bit late to the party, but my 2 cents:

It seems that JuliaGeo is moving towards traits instead of (super)types. See https://github.com/yeesian/GeoInterfaceRFC.jl/. So I would vote against a new pointcloud supertype. I've dabbled with having special Pointcloud traits (has_intensity, etc), for https://github.com/Deltares/PointCloudFilters.jl (very much a WIP), as there are many different sources (LasIO, LazIO, random csv or HDF5). I'll be hacking away at this approach this summer (August mostly).

With regards to the scaling/offset, I hardcoded that in an old PR, see #13, both for reading/writing. But I very much like the mappedarray approach, although I'm still in doubt if I want that as one "point" column or as three separate columns. But that's a minor detail, approach seems fine. It's always great when you find, even with a small ecosystem, that the tooling already exists and can be easily applied.

c42f commented 4 years ago

I'm still in doubt if I want that as one "point" column or as three separate columns

In my experience I want it as a point column for work within julia, but for interop with other systems, you may need it as separate columns. (Essentially because other systems don't have powerful enough type systems to talk about points in their own right.)

So having a neat way to wrap several columns into one poin at input and unwrap them into several components for output is good.