yeesian / ArchGDAL.jl

A high level API for GDAL - Geospatial Data Abstraction Library
https://yeesian.github.io/ArchGDAL.jl/stable/
Other
142 stars 26 forks source link

Table to IFeatureLayer #243

Open mathieu17g opened 3 years ago

mathieu17g commented 3 years ago

Here is a 1st draft to fix issue #152

I have implemented the conversion from table to an IFeatureLayer in a memory dataset. It is done via an IFeatureLayer constructor but it can be moved to a createlayer method.

Here are some modifications that has / may need to be implemented:

Other additions:

Here is a test on multi_geom.csv test file:

julia> ds = AG.read("multi_geom.csv",
           options = [
               "GEOM_POSSIBLE_NAMES=point,linestring",
               "KEEP_GEOM_COLUMNS=NO",
           ],
       )
       layer = AG.getlayer(ds, 0)
       @show layer
       df = DataFrame(layer)
       @show df; println()
       @show describe(df); println()
       new_layer = AG.IFeatureLayer(df)
       @show new_layer
       new_df = DataFrame(new_layer)
       @show new_df; println()
       @show describe(new_df); println()
layer = Layer: multi_geom
  Geometry 0 (point): [wkbUnknown]
  Geometry 1 (linestring): [wkbUnknown]
     Field 0 (id): [OFTString], 5.1, 5.2
     Field 1 (zoom): [OFTString], 1.0, 2.0
     Field 2 (location): [OFTString], Mumbai, New Delhi

df = 2×5 DataFrame
 Row │ point               linestring               id      zoom    location
     │ IGeometr…           IGeometry…               String  String  String
─────┼────────────────────────────────────────────────────────────────────────
   1 │ Geometry: wkbPoint  Geometry: wkbLineString  5.1     1.0     Mumbai
   2 │ Geometry: wkbPoint  Geometry: wkbLineString  5.2     2.0     New Delhi

describe(df) = 5×7 DataFrame
 Row │ variable    mean     min     median   max        nmissing  eltype
     │ Symbol      Nothing  Union…  Nothing  Union…     Int64     DataType
─────┼─────────────────────────────────────────────────────────────────────────────────────
   1 │ point                                                   0  IGeometry{wkbPoint}
   2 │ linestring                                              0  IGeometry{wkbLineString}
   3 │ id                   5.1              5.2               0  String
   4 │ zoom                 1.0              2.0               0  String
   5 │ location             Mumbai           New Delhi         0  String

new_layer = Layer: 
  Geometry 0 (point): [wkbPoint]
  Geometry 1 (linestring): [wkbLineString]
     Field 0 (id): [OFTString], 5.1, 5.2
     Field 1 (zoom): [OFTString], 1.0, 2.0
     Field 2 (location): [OFTString], Mumbai, New Delhi

new_df = 2×5 DataFrame
 Row │ point               linestring               id      zoom    location
     │ IGeometr…           IGeometry…               String  String  String
─────┼────────────────────────────────────────────────────────────────────────
   1 │ Geometry: wkbPoint  Geometry: wkbLineString  5.1     1.0     Mumbai
   2 │ Geometry: wkbPoint  Geometry: wkbLineString  5.2     2.0     New Delhi

describe(new_df) = 5×7 DataFrame
 Row │ variable    mean     min     median   max        nmissing  eltype
     │ Symbol      Nothing  Union…  Nothing  Union…     Int64     DataType
─────┼─────────────────────────────────────────────────────────────────────────────────────
   1 │ point                                                   0  IGeometry{wkbPoint}
   2 │ linestring                                              0  IGeometry{wkbLineString}
   3 │ id                   5.1              5.2               0  String
   4 │ zoom                 1.0              2.0               0  String
   5 │ location             Mumbai           New Delhi         0  String
yeesian commented 3 years ago

(MAYBE) define a function dataset(::AbstractFeatureLayer) instead of calling ownedby property.

I do think it's a good idea to move away from the ownedby property -- I have ideas/plans of moving away from it in a systematic approach for addressing #171 and #215.

I'll write up about it in a RFC and get thoughts and feedback first.

yeesian commented 3 years ago

Unfortunately, it'll be difficult to review this PR with https://github.com/yeesian/ArchGDAL.jl/pull/243/commits/38a23ef8fea0600da272e65fe1583511dc046620. If I still remember it right, I'd suggest

  1. keeping a copy of this branch (in case something goes wrong),
  2. deleting the merge commit https://github.com/yeesian/ArchGDAL.jl/pull/243/commits/38a23ef8fea0600da272e65fe1583511dc046620 (e.g. possibly in a git rebase interactive session)
  3. rebasing to master using git at the command line, resolve any conflicts, and
  4. force pushing to this branch
mathieu17g commented 3 years ago

OK as PR #245, I will close this PR and open a new one. Currently, I have rebase to abort locally that blocks me. I'm digging into it

  1. keeping a copy of this branch (in case something goes wrong),
  2. deleting the merge commit 38a23ef (e.g. possibly in a git rebase interactive session)
  3. rebasing to master using git at the command line, resolve any conflicts, and
  4. force pushing to this branch

Thanks, I did as advised below

mathieu17g commented 3 years ago

I have added a 1st draft for GeoInterface geometries handling

julia> using ArchGDAL; const AG = ArchGDAL
       using LibGEOS
       using DataFrames

julia> df = DataFrame(AG.getlayer(AG.read("tmp/road/road.shp"), 0))
       df = transform(
           df, 
           Symbol("") => 
           (x -> passmissing(LibGEOS.readgeom ∘ ArchGDAL.toWKT).(x)) => 
           Symbol("")
       )
       df[1:5, :]
5×8 DataFrame
 Row │                                    gid    roadcode     plod    absd     plof    absf     analyse             ⋯
     │ Abstract…?                         Int32  String       String  Float64  String  Float64  String              ⋯
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ LineString(Ptr{Nothing} @0x00007…      1  01A903905CD  DB1         0.0  FB1         0.0  2 - Bretelles autor ⋯
   2 │ LineString(Ptr{Nothing} @0x00007…      2  01A903905CD  DB2         0.0  FB2         0.0  2 - Bretelles autor
   3 │ LineString(Ptr{Nothing} @0x00007…      3  01A903905CD  DB3         0.0  FB3         0.0  2 - Bretelles autor
   4 │ LineString(Ptr{Nothing} @0x00007…      4  01A903905CD  DB4         0.0  FB4         0.0  2 - Bretelles autor
   5 │ LineString(Ptr{Nothing} @0x00007…      5  01A903910CD  DB1         0.0  FB1         0.0  2 - Bretelles autor ⋯
                                                                                                     1 column omitted

julia> AG.IFeatureLayer(df; name="roads")
Layer: roads
  Geometry 0 (): [wkbUnknown], LINESTRING (874644.4...), ...
     Field 0 (gid): [OFTInteger], 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, ...
     Field 1 (roadcode): [OFTString], 01A903905CD, 01A903905CD, 01A903905CD, ...
     Field 2 (plod): [OFTString], DB1, DB2, DB3, DB4, DB1, DB2, DB1, DB2, ...
     Field 3 (absd): [OFTReal], 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
     Field 4 (plof): [OFTString], FB1, FB2, FB3, FB4, FB1, FB2, FB1, FB2, ...
...
 Number of Fields: 7
mathieu17g commented 3 years ago

Voilà, it seems to work

GeoInterface geometries

julia> using ArchGDAL; const AG = ArchGDAL
       using LibGEOS
       using DataFrames

julia> nt = NamedTuple([
           :point => [LibGEOS.readgeom("POINT (30 10)"), missing, LibGEOS.readgeom("POINT (3 7)")],
           :id => [nothing, "5.1", "5.2"]
       ])
(point = Union{Missing, Point}[Point(Ptr{Nothing} @0x00007f87e1571600), missing, Point(Ptr{Nothing} @0x00007f87e15a7bf0)], id = Union{Nothing, String}[nothing, "5.1", "5.2"])

julia> AG.IFeatureLayer(nt)
Layer: layer
  Geometry 0 (point): [wkbPoint], POINT (30 10), NULL Geometry, POINT (3 7)
     Field 0 (id): [OFTString], nothing, 5.1, 5.2

julia> DataFrame(ans)
3×2 DataFrame
 Row │ point               id     
     │ IGeometr…?          Union… 
─────┼────────────────────────────
   1 │ Geometry: wkbPoint         
   2 │ missing             5.1
   3 │ Geometry: wkbPoint  5.2

WKT geometries

julia> nt = NamedTuple([
           :point => ["POINT (30 10)", missing, "POINT (3 7)"],
           :id => [nothing, "5.1", "5.2"]
       ])
(point = Union{Missing, String}["POINT (30 10)", missing, "POINT (3 7)"], id = Union{Nothing, String}[nothing, "5.1", "5.2"])

julia> AG.IFeatureLayer(nt; parseWKT=true)
Layer: layer
  Geometry 0 (point): [wkbPoint], POINT (30 10), NULL Geometry, POINT (3 7)
     Field 0 (id): [OFTString], nothing, 5.1, 5.2

julia> DataFrame(ans)
3×2 DataFrame
 Row │ point               id     
     │ IGeometr…?          Union… 
─────┼────────────────────────────
   1 │ Geometry: wkbPoint         
   2 │ missing             5.1
   3 │ Geometry: wkbPoint  5.2

WKB geometries

julia> nt = NamedTuple([
           :point => [AG.toWKB(AG.createpoint(30, 10)), missing, AG.toWKB(AG.createpoint(3, 7))],
           :id => [nothing, "5.1", "5.2"]
       ])
(point = Union{Missing, Vector{UInt8}}[UInt8[0x01, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00  …  0x3e, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x24, 0x40], missing, UInt8[0x01, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00  …  0x08, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x1c, 0x40]], id = Union{Nothing, String}[nothing, "5.1", "5.2"])

julia> AG.IFeatureLayer(nt; parseWKB=true)
Layer: layer
  Geometry 0 (point): [wkbPoint], POINT (30 10), NULL Geometry, POINT (3 7)
     Field 0 (id): [OFTString], nothing, 5.1, 5.2

julia> DataFrame(ans)
3×2 DataFrame
 Row │ point               id     
     │ IGeometr…?          Union… 
─────┼────────────────────────────
   1 │ Geometry: wkbPoint         
   2 │ missing             5.1
   3 │ Geometry: wkbPoint  5.2

Notes:

mathieu17g commented 2 years ago

@yeesian I dropped fieldtypes kwarg in src/ and test/ to check if it still works properly before stashing all the WKT/WKB parsing and geomcols kwarg

mathieu17g commented 2 years ago

@yeesian, in commit e8560e4, I dropped geomcols kwarg in src/ and test/

Note: Coverage is still impaired by codecov coverage not handling generated code properly. When I test coverage locally with Pkg.jl and Coverage.jl everything new is covered

yeesian commented 2 years ago

Wanted to say I like the new API a lot, thank you so much for working through all the review comments!

ErickChacon commented 1 year ago

It would very nice to have this feature. It would make integration with other geometry packages easier. Is there any reason why this effort stopped? Is there still interest on this feature?