visr / LasIO.jl

Julia package for reading and writing the LAS lidar format.
Other
22 stars 13 forks source link

Added streaming support, meta generation for IO functions #10

Closed evetion closed 6 years ago

evetion commented 6 years ago

Added streaming support as discussed in #4

codecov-io commented 6 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@437a8f4). Click here to learn what that means. The diff coverage is 44.89%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #10   +/-   ##
=========================================
  Coverage          ?   62.62%           
=========================================
  Files             ?        8           
  Lines             ?      396           
  Branches          ?        0           
=========================================
  Hits              ?      248           
  Misses            ?      148           
  Partials          ?        0
Impacted Files Coverage Δ
src/LasIO.jl 100% <ø> (ø)
src/fileio.jl 46.15% <100%> (ø)
src/meta.jl 4.54% <4.54%> (ø)
src/point.jl 64% <57.14%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 437a8f4...738c0f4. Read the comment docs.

visr commented 6 years ago

Nice :) The generated read is still slower for me, for you as well? Because in the linked StructIO issue read_written_out and generated_read show the same performance (also when I run it).

Original function
  85.615 ms (994614 allocations: 32.26 MiB)
New function using generated read
  105.606 ms (994614 allocations: 32.26 MiB)
New function using streaming
  97.013 ms (4477381 allocations: 144.24 MiB)
evetion commented 6 years ago

The generated read is still slower for me, for you as well?

I also experience this, cannot pinpoint it yet. Profiling does yield different structures, could be some sort of optimization somewhere.

With a true streaming system, you'd more typically provide an interface to read points individually or in chunks.

I can try to implement read(stream::IO, T, dims) in a true streaming fashion. I can imagine that the current constant offset calculation has some overhead.

evetion commented 6 years ago

➜ julia test/benchmark.jl Original function 80.815 ms (994614 allocations: 32.26 MiB) New function using generated read 97.666 ms (994614 allocations: 32.26 MiB) New function using streaming 76.465 ms (3482306 allocations: 60.73 MiB)

With the suggestions of @c42f, streaming is now a bit faster and uses even less memory. One thing I learned; seek(io, position) is 0 based(!).

One thing that's still missing is memory mapped saving:

The first is easiest to implement, we could write a load!() function for it. The second however is harder, our Points are immutable and saving is not yet memory mapped. A use case would include adding RGB values using images to a larger than memory pointcloud.

evetion commented 6 years ago
visr commented 6 years ago

Ok finally got around to looking at this a bit, sorry for the delay. Would prefer in the future if the unrelated changes are in separate PRs. Though the gen_io is definitely nice to have.

For now I got stuck in the REPL after loading a LasPoint3 with mmap=true:

julia> header, points = load("test/laspoint3.las", mmap=true)
(LasHeader with 2040564 points.
, LasIO.PointVector{LasIO.LasPoint3} with 2040564 points.
)

julia> p=first(points)
ERROR: MethodError: Cannot `convert` an object of type Type{FixedPointNumbers.Normed{UInt16,16}} to an object of type Array{UInt8,1}
This may have arisen from a call to the constructor Array{UInt8,1}(...),
since type constructors fall back to convert methods.
Stacktrace:
 [1] read(::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Type{T} where T) at .\io.jl:528
 [2] read(::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Type{LasIO.LasPoint3}) at .\<missing>:0
 [3] getindex(::LasIO.PointVector{LasIO.LasPoint3}, ::Int64) at C:\Users\visser_mn\.julia\v0.6\LasIO\src\point.jl:30
 [4] first(::LasIO.PointVector{LasIO.LasPoint3}) at .\abstractarray.jl:135

Note that in LasPoint2 and LasPoint3 reinterpret is used. How would you solve that in the gen_io macro? https://github.com/visr/LasIO.jl/blob/437a8f486c2c743f7df5f37e68dc0730eb983483/src/point.jl#L157-L159

evetion commented 6 years ago

Thanks for taking time to review and improving upon this PR. If you'd like, I split this in two separate PRs.

With regards to the FixedPointNumbers, I've provided a fix;

Base.read(io::IO, ::Type{N0f16}) = reinterpret(N0f16, read(io, UInt16))

Are you open to adding the laspoint3.las in your example to the repo? In theory we should test against every pointtype we support. We can keep these files very small with only a handful of points.

visr commented 6 years ago

Ah yeah that works, I assume we need a write counterpart as well? No need to split it now. Would indeed be better to test all the point types. Either a small test file in the repo, or we generate it ourselves.

evetion commented 6 years ago

I've added the write version. I propose we add the tests for all pointtypes in another PR. I can work on that.

visr commented 6 years ago

If possible, I'd even encourage removing the stream keyword, and always just using mmap for IO which support it.

Let's try this out a bit first, we can always still flip the switch, ok?

c42f commented 6 years ago

Sorry guys I haven't found time to come back to this (I'm not using las at all these days...)

Looks great to me in general :+1: