visr / LasIO.jl

Julia package for reading and writing the LAS lidar format.
Other
22 stars 13 forks source link

[WIP] First DataStreams version #13

Open evetion opened 6 years ago

evetion commented 6 years ago

Here is a WIP for implementing DataStreams. Still rough around the edges, but I'd like some feedback for the overall direction/API.

What

DataStreams seems the way to go in the Julia data ecosystem, enabling streaming conversions, for example, between CSV and SQLite, or DataFrames, DataTables. This enables users to easily read LAS files into DataFrames and back, without needing to know any raw header/point information.

Why

This addresses most of the comments from @c42f in #4 for a new API and v0.1

TODO

For using the Source:

For using the Sink

Discussion

For using the Sink we need some discussion. Writing now only works with Source columns that match a LasPoint perfectly. Do we fill these gaps, and thus allow for an invalid point type? And the xyz coordinates will be in Float, which we need to scale/offset. Doing this afterwards is detrimental for performance. I would propose for float input:

Further improvements can be made to the pointtypes. Since these are hardly used by this implementation (only looking up attributes and types for the Schema creation) we could explode some attributes such as the flag_byte into their individual components for better accessibility. I'm not sure what this would do to the performance though.

Demo

julia> s = LasIO.Source("test/srs.las")
LasIO.Source(Data.Schema:
rows: 10  cols: 10
Columns:
 "x"                   Int32  
 "y"                   Int32  
 "z"                   Int32  
 "intensity"           UInt16 
 "flag_byte"           UInt8  
 "raw_classification"  UInt8  
 "scan_angle"          Int8   
 "user_data"           UInt8  
 "pt_src_id"           UInt16 
 "gps_time"            Float64, LasHeader with 10 points.
, IOStream(<file test/srs.las>), "test/srs.las", 759)

julia> d = Data.stream!(s, DataFrame)
DataFrames.DataFrameStream{Tuple{Array{Int32,1},Array{Int32,1},Array{Int32,1},Array{UInt16,1},Array{UInt8,1},Array{UInt8,1},Array{Int8,1},Array{UInt8,1},Array{UInt16,1},Array{Float64,1}}}((Int32[28981415, 28981464, 28981512, 28981560, 28981608, 28981656, 28981703, 28981753, 28981801, 28981850], Int32[432097861, 432097884, 432097906, 432097928, 432097950, 432097971, 432097992, 432098016, 432098038, 432098059], Int32[17076, 17076, 17075, 17074, 17068, 17066, 17063, 17062, 17061, 17058], UInt16[0x0104, 0x0118, 0x0118, 0x0118, 0x0104, 0x00f0, 0x00f0, 0x0118, 0x0118, 0x0104], UInt8[0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30], UInt8[0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02], Int8[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], UInt8[0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00], UInt16[0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000], [4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5]), String["x", "y", "z", "intensity", "flag_byte", "raw_classification", "scan_angle", "user_data", "pt_src_id", "gps_time"])

julia> Data.close!(d)
10Ɨ10 DataFrames.DataFrame
ā”‚ Row ā”‚ x        ā”‚ y         ā”‚ z     ā”‚ intensity ā”‚ flag_byte ā”‚ raw_classification ā”‚ scan_angle ā”‚ user_data ā”‚ pt_src_id ā”‚ gps_time  ā”‚
ā”œā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤
ā”‚ 1   ā”‚ 28981415 ā”‚ 432097861 ā”‚ 17076 ā”‚ 0x0104    ā”‚ 0x30      ā”‚ 0x02               ā”‚ 0          ā”‚ 0x00      ā”‚ 0x0000    ā”‚ 4.99451e5 ā”‚
ā”‚ 2   ā”‚ 28981464 ā”‚ 432097884 ā”‚ 17076 ā”‚ 0x0118    ā”‚ 0x30      ā”‚ 0x02               ā”‚ 0          ā”‚ 0x00      ā”‚ 0x0000    ā”‚ 4.99451e5 ā”‚
ā”‚ 3   ā”‚ 28981512 ā”‚ 432097906 ā”‚ 17075 ā”‚ 0x0118    ā”‚ 0x30      ā”‚ 0x02               ā”‚ 0          ā”‚ 0x00      ā”‚ 0x0000    ā”‚ 4.99451e5 ā”‚
ā”‚ 4   ā”‚ 28981560 ā”‚ 432097928 ā”‚ 17074 ā”‚ 0x0118    ā”‚ 0x30      ā”‚ 0x02               ā”‚ 0          ā”‚ 0x00      ā”‚ 0x0000    ā”‚ 4.99451e5 ā”‚
ā”‚ 5   ā”‚ 28981608 ā”‚ 432097950 ā”‚ 17068 ā”‚ 0x0104    ā”‚ 0x30      ā”‚ 0x02               ā”‚ 0          ā”‚ 0x00      ā”‚ 0x0000    ā”‚ 4.99451e5 ā”‚
ā”‚ 6   ā”‚ 28981656 ā”‚ 432097971 ā”‚ 17066 ā”‚ 0x00f0    ā”‚ 0x30      ā”‚ 0x02               ā”‚ 0          ā”‚ 0x00      ā”‚ 0x0000    ā”‚ 4.99451e5 ā”‚
ā”‚ 7   ā”‚ 28981703 ā”‚ 432097992 ā”‚ 17063 ā”‚ 0x00f0    ā”‚ 0x30      ā”‚ 0x02               ā”‚ 0          ā”‚ 0x00      ā”‚ 0x0000    ā”‚ 4.99451e5 ā”‚
ā”‚ 8   ā”‚ 28981753 ā”‚ 432098016 ā”‚ 17062 ā”‚ 0x0118    ā”‚ 0x30      ā”‚ 0x02               ā”‚ 0          ā”‚ 0x00      ā”‚ 0x0000    ā”‚ 4.99451e5 ā”‚
ā”‚ 9   ā”‚ 28981801 ā”‚ 432098038 ā”‚ 17061 ā”‚ 0x0118    ā”‚ 0x30      ā”‚ 0x02               ā”‚ 0          ā”‚ 0x00      ā”‚ 0x0000    ā”‚ 4.99451e5 ā”‚
ā”‚ 10  ā”‚ 28981850 ā”‚ 432098059 ā”‚ 17058 ā”‚ 0x0104    ā”‚ 0x30      ā”‚ 0x02               ā”‚ 0          ā”‚ 0x00      ā”‚ 0x0000    ā”‚ 4.99451e5 ā”‚

julia> Data.reset!(s)

julia> d = Data.stream!(s, LasIO.Sink, "test_final.las")
Stream now at 227
LasIO.Sink{LasIO.LasPoint1}(IOStream(<file test_final.las>), LasHeader with 10 points.
, LasIO.LasPoint1)

julia> Data.close!(d)
LasIO.Sink{LasIO.LasPoint1}(IOStream(<file test_final.las>), LasHeader with 10 points.
, LasIO.LasPoint1)
āžœ lasinfo test_final.las
lasinfo (170528) report for test_final.las
reporting all LAS header entries:
  file signature:             'LASF'
  file source ID:             0
  global_encoding:            0
  project ID GUID data 1-4:   00000000-0000-0000-2020-202020202020
  version major.minor:        1.0
  system identifier:          'LasIO.jl datastream             '
  generating software:        'LasIO.jl                        '
  file creation day/year:     10/2018
  header size:                227
  offset to point data:       227
  number var. length records: 0
  point data format:          1
  point data record length:   28
  number of point records:    10
  number of points by return: 10 0 0 0 0
  scale factor x y z:         0.01 0.01 0.01
  offset x y z:               0 0 0
  min x y z:                  289814.15 4320978.61 170.58
  max x y z:                  289818.50 4320980.59 170.76
reporting minimum and maximum for all LAS point record entries ...
  X            28981415   28981850
  Y           432097861  432098059
  Z               17058      17076
  intensity         240        280
  return_number       0          0
  number_of_returns   6          6
  edge_of_flight_line 0          0
  scan_direction_flag 0          0
  classification      2          2
  scan_angle_rank     0          0
  user_data           0          0
  point_source_ID     0          0
  gps_time 499450.805994 499450.806120
number of first returns:        10
number of intermediate returns: 0
number of last returns:         0
number of single returns:       0
WARNING: for return 1 real number of points by return (0) is different from header entry (10).
WARNING: there are 10 points with return number 0
overview over number of returns of given pulse: 0 0 0 0 0 10 0
histogram of classification of points:
              10  ground (2)
evetion commented 6 years ago

I think this also comes close to the comment at https://github.com/FugroRoames/PointClouds.jl:

Perhaps one day PointCloud can be implemented in terms of an underlying DataFrame [..]

julia> s = LasIO.Source("test/srs.las")
julia> d = Data.stream!(s, DataFrame)
julia> d = Data.close!(d)
julia> d[:intensity]
10-element Array{UInt16,1}:
 0x0104
 0x0118
 0x0118
 0x0118
 0x0104
 0x00f0
 0x00f0
 0x0118
 0x0118
 0x0104
evetion commented 6 years ago
visr commented 5 years ago

I believe now, a year later, it makes more sense do add support for the new Tables.jl interface instead. Perhaps good to focus on getting in #16 first, and then revisiting this? Since #16 will also affect the API.

evetion commented 5 years ago

Not sure about this clashing with #16, these are two separate approaches in my opinion.

visr commented 5 years ago

How do you mean two separate approaches? As in we should have one or the other? I thought we can have both right?

In any case it might be good to try to get LAS 1.3 and 1.4 support in first, as it is becoming increasingly common.