yeesian / ArchGDAL.jl

A high level API for GDAL - Geospatial Data Abstraction Library
https://yeesian.github.io/ArchGDAL.jl/stable/
Other
142 stars 26 forks source link

Enhance ArchGDAL OGR objects with parametric composite types #266

Open mathieu17g opened 2 years ago

mathieu17g commented 2 years ago

Here is a draft for review, demonstrating:

  1. the use of ogr_f_stealgeometry for a gain of an additionnal 7% gain on layer to table conversion when there is only one geometry field
  2. a way to dispatch on type in getfield (to fix issue #246) which brings 5% performance gain to the latter

Performance results Current master

julia> @benchmark Tables.columntable(AG.getlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 131 samples with 1 evaluation.
 Range (min … max):  194.701 ms … 341.328 ms  ┊ GC (min … max): 0.00% … 14.57%
 Time  (median):     210.470 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   229.013 ms ±  38.876 ms  ┊ GC (mean ± σ):  3.38% ±  5.16%

    ▅▁█▅█▁▂ ▁                                                    
  ▄▆███████▃█▅▅▃▄▄▄▄▃▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▃▃▅▃▄▁▅▃▁▆▁▅▃▁▄▁▁▃ ▃
  195 ms           Histogram: frequency by time          326 ms <

 Memory estimate: 8.98 MiB, allocs estimate: 373869.

With Shapefile.jl

julia> @benchmark Tables.columntable(Shapefile.Table("../../tmp/road/road.shp")) seconds = 30
BenchmarkTools.Trial: 187 samples with 1 evaluation.
 Range (min … max):  148.703 ms … 179.867 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     160.331 ms               ┊ GC (median):    5.03%
 Time  (mean ± σ):   160.476 ms ±   6.494 ms  ┊ GC (mean ± σ):  3.30% ± 3.16%

       ▂   ▅  ▃    ▅▂▆▂        ▃             █ ▂▃ ▂              
  ▄▁▄▅▄██▅██▇▄██▅████████▇▅▄▄▄▅█▅██▇██▇▅▁▅▇█▅█▇██▅███▄▄▇▄▁▅▄▁▁▄ ▄
  149 ms           Histogram: frequency by time          174 ms <

 Memory estimate: 35.89 MiB, allocs estimate: 243040.

Current master with ogr_f_stealgeometry used in getcolumn

ulia> @benchmark Tables.columntable(AG.getlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 140 samples with 1 evaluation.
 Range (min … max):  180.622 ms … 344.234 ms  ┊ GC (min … max): 0.00% … 14.01%
 Time  (median):     196.758 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   215.515 ms ±  45.714 ms  ┊ GC (mean ± σ):  3.48% ±  5.50%

   ▁▁ ▅▃█                                                        
  ▃████████▆▁▆▃▃▁▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃▁▁▃▃▄▄▃▄▃▃▃▁▁▃ ▃
  181 ms           Histogram: frequency by time          337 ms <

 Memory estimate: 8.98 MiB, allocs estimate: 373879.

With new OGR parametric types and with ogr_f_stealgeometry used in getcolumn

julia> @benchmark Tables.columntable(AG.getFDPlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 147 samples with 1 evaluation.
 Range (min … max):  171.544 ms …   1.002 s  ┊ GC (min … max): 0.00% … 18.02%
 Time  (median):     187.186 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   204.451 ms ± 72.860 ms  ┊ GC (mean ± σ):  2.00% ±  3.25%

    ▃█▂▄                                                        
  ▅▆████▇▅▄▃▁▃▃▃▄▁▁▁▁▃▃▃▄▃▃▅▄▃▃▃▁▁▁▁▁▁▁▁▁▃▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
  172 ms          Histogram: frequency by time          359 ms <

 Memory estimate: 7.75 MiB, allocs estimate: 279381.

1. ArchGDAL OGR related types moved to parametric composite types

Trying to dispatch on types required:

1.1 Creation of two parametric singleton types for fields and geomfields, and of a featuredefn like type alias

Note 1: I have added convert functions from FType and DataType which yields a more readable correspondance between Julia DataTypes and GDAL Field type ids, than the current separated convert functions from OGRFieldType and OGRFieldSubType to DataType

Note 2: I have also added a commented draft of new Geometry parametric singleon type (named Geom) based on GType but not used it since I have not identified any benefit

1.2 Duplication of most of OGR related composite types and parameterization with FType, GType and FDType

The duplicated types have the FDP prefix

All of them have been set as subtypes of corresponding abstract supertypes DUAL_AbstractXXX, in order to limit the duplication of functions code when not necessary.

Note: A plain FeatureDefn parametric composite type could later be used as type parameter for Feature and FeatureLayer

1.3 Modification of a few functions, used to show a layer and convert it to a table

Note: In the event of the unlikely use case where there would be too many generated functions (a lot of different data sources with different featuredefn) for compilation to be performed in a reasonable time or even to succeed, the generated functions could be moved to optionally-generated functions. The fallbacks being the legacy functions

1.4 Side additions

  1. I have added a macro to export Enum type id for user convenience. Disabled in the PR because it is not convenient when developping using revise.jl because it redefines the related constants. I have not found a way to inhibit the latter
  2. I have generalized the ownedby field in feature layer related types which may serve as a basis to fix issue #215, but @yeesian is tackling this one. For all intents and purposes here are below the relationships which are proposed:
    Ownership relations graph OGR Julia objects ownership relations diagram V1

    destroy functions for FDP_XXX types have been created accordingly

2. Steal geometry instead of cloning for the first geometry in features

There appears that Tables.rows(layer) makes a copy of the layer.
Since ogr_f_stealgeometry is faster than the composition of ogr_f_getgeomfieldref and ogr_g_clone, I have modified Tables.getcolumn to use a stealgeom function instead of the current getgeom function.

Unfortunately, ogr_f_getgeomfieldref is implemented in GDAL C interface only for the default (first) geometry. Datasources with only one geometry per feature is the general case, but that could be an issue to raise in GDAL.

3. Further performance optimizations to investigate, regarding the layer to table conversion performance

mathieu17g commented 2 years ago
  • Include field and geomfield names in feature definition type FDType to avoid calling ogr_f_getfieldindex

I have changed the FeatureDefn type parameter of ArchGDAL OGR objects from

FDType = Tuple{NTuple{NG,GType} where NG,NTuple{NF,FType} where NF}

which is the same as

FDType = Tuple{
    Tuple{Vararg{GType, NG}} where NG, 
    Tuple{Vararg{FType, NF}} where NF,
}

to (thanks to https://discourse.julialang.org/t/parametric-type-with-a-constrained-namedtuple/34511)

FDType = Tuple{
    NamedTuple{NG,<:Tuple{Vararg{GType}}} where NG,
    NamedTuple{NF,<:Tuple{Vararg{FType}}} where NF,
}
  1. a way to dispatch on type in getfield (to fix issue #246) which brings 5% performance gain to the latter

The total performance gain with with the addition of avoiding most of the calls to ogr_f_getfieldindex is now 10% compared to 1.

Current master:

@benchmark Tables.columntable(AG.getlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 142 samples with 1 evaluation.
 Range (min … max):  182.404 ms … 335.852 ms  ┊ GC (min … max): 0.00% … 10.29%
 Time  (median):     196.571 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   213.022 ms ±  37.221 ms  ┊ GC (mean ± σ):  3.27% ±  4.88%

   ▃█▅▄▂▄  ▄                                                     
  ▆██████▇██▇▃▃▁▃▁▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▁▁▃▃▅▄▇▁▃▄▆▃▅▃▁▁▁▁▁▁▁▁▁▃ ▃
  182 ms           Histogram: frequency by time          318 ms <

 Memory estimate: 8.98 MiB, allocs estimate: 373869.

Shapefile.jl:

julia> @benchmark Tables.columntable(Shapefile.Table("../../tmp/road/road.shp")) seconds = 30BenchmarkTools.Trial: 198 samples with 1 evaluation.
 Range (min … max):  142.824 ms … 163.894 ms  ┊ GC (min … max): 0.00% … 5.48%
 Time  (median):     152.970 ms               ┊ GC (median):    4.10%
 Time  (mean ± σ):   152.064 ms ±   4.070 ms  ┊ GC (mean ± σ):  2.95% ± 2.46%

                  ▃▂▂▄ ▄               ▇▂█▂▄▄▅                   
  ▃▁▁▃▁▃▃▅▆▅▆▆▅█▇█████▅██▃▃▅▃▃▅▁█▆██▆▇████████▆▃▆▇▇▆▅▅▁▁▁▃▃▁▁▁▃ ▃
  143 ms           Histogram: frequency by time          161 ms <

 Memory estimate: 35.89 MiB, allocs estimate: 243040.

Current master with geometry stealing:

julia> @benchmark Tables.columntable(AG.getlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 144 samples with 1 evaluation.
 Range (min … max):  178.243 ms … 312.567 ms  ┊ GC (min … max): 0.00% … 13.64%
 Time  (median):     191.755 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   208.443 ms ±  40.801 ms  ┊ GC (mean ± σ):  3.39% ±  5.23%

    █▂▁▆▅▂▂                                                      
  ▃████████▅▄▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▁▁▃▃▃▄▃▅▁▃▄▃▃▃▃ ▃
  178 ms           Histogram: frequency by time          312 ms <

 Memory estimate: 8.98 MiB, allocs estimate: 373879.

New ArchGDAL OGR types with geometry stealing:

julia> @benchmark Tables.columntable(AG.getFDPlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 169 samples with 1 evaluation.
 Range (min … max):  158.547 ms … 245.862 ms  ┊ GC (min … max): 0.00% … 7.86%
 Time  (median):     169.471 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   178.152 ms ±  22.915 ms  ┊ GC (mean ± σ):  1.48% ± 2.67%

   ▁▃▃▆▄▇█▁▃▂▆▁                                                  
  ▇████████████▇▄▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃▅▇▃▃▆▄▅▃▃▁▃▁▁▁▃ ▃
  159 ms           Histogram: frequency by time          243 ms <

 Memory estimate: 7.75 MiB, allocs estimate: 279394.

Beyond the call to ogr_l_getnextfeature, the last step taking some time is the IGeometry constructor

mathieu17g commented 2 years ago

@yeesian I'm not able to push the commit associated to my previous comment into branch Enhance_ArchGDAL_types. I got the following message:

remote: error: GH006: Protected branch update failed for refs/heads/Enhance_ArchGDAL_types.
remote: error: At least 1 approving review is required by reviewers with write access.
To https://github.com/yeesian/ArchGDAL.jl.git
! [remote rejected] Enhance_ArchGDAL_types -> Enhance_ArchGDAL_types (protected branch hook declined)
error: failed to push some refs to 'https://github.com/yeesian/ArchGDAL.jl.git'

Do you know what I did wrong ?

yeesian commented 2 years ago

I've updated the branch protection settings now, can you try again? Thanks!

mathieu17g commented 2 years ago

It stills fails, but I have a different message:

remote: error: GH006: Protected branch update failed for refs/heads/Enhance_ArchGDAL_types.
remote: error: Changes must be made through a pull request.
To https://github.com/yeesian/ArchGDAL.jl.git
! [remote rejected] Enhance_ArchGDAL_types -> Enhance_ArchGDAL_types (protected branch hook declined)
error: failed to push some refs to 'https://github.com/yeesian/ArchGDAL.jl.git'
mathieu17g commented 2 years ago

Shall I try a force push with lease ?

yeesian commented 2 years ago

It stills fails, but I have a different message

I have further relaxed the branch protection rules -- I'm so sorry for the inconvenience so far -- can you try yet again? I'll get to your PRs soon; have been on the road for the past few days.

mathieu17g commented 2 years ago

Great, it works. I was able to push commit a146c79 above

mathieu17g commented 2 years ago

Continuing to shave off time in layer to table conversion, I noticed that _infergeomtype could be optimized by modifying convert functions between GDAL CEnum.Cenum types to AG Enum types.

New convert functions

function convert(::Type{OGRwkbGeometryType}, gogtinst::GDAL.OGRwkbGeometryType)
    return OGRwkbGeometryType(Integer(gogtinst))
end
function convert(::Type{GDAL.OGRwkbGeometryType}, ogtinst::OGRwkbGeometryType)
    return GDAL.OGRwkbGeometryType(Integer(ogtinst))
end

The convert functions code looks simpler and faster like this, and could be extended to other similar conversions between ArchGDAL Enum and GDAL CEenum.Cenum

This has the prerequisite to align OGRwkbGeometryType Enum instances' assigned values to those of GDAL.OGRwkbGeometryType. It should not have any side effect.

Results

julia> @benchmark AG._infergeomtype($(geom.ptr)) BenchmarkTools.Trial: 10000 samples with 949 evaluations. Range (min … max): 83.511 ns … 209.497 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 97.937 ns ┊ GC (median): 0.00% Time (mean ± σ): 95.838 ns ± 13.089 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

█ ▆ ▂▂ ▄█▂▁▅ ▂▂ ▁▁▁▃▁▁ ▁ ▂ ▂ █▂█▄██▆▆▅▄█▆██████████▇███████▇▇▇█▇▇▇▇▇██▆▆▇▇█▆▆▆▇▆▅▆▆▄▅▅▆▅▆ █ 83.5 ns Histogram: log(frequency) by time 143 ns <

Memory estimate: 0 bytes, allocs estimate: 0.

- With new convert functions:
```julia
julia> geom = AG.fromWKT("POINT (1 2)")
Geometry: POINT (1 2)

julia> @benchmark AG._infergeomtype($(geom.ptr))
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min … max):  18.790 ns … 78.432 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.063 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.519 ns ±  2.273 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▄ ▅  ▁                      ▂ ▂               ▁           ▂
  ███▅██▄█▃▅▄▃▄▁▁▄▃▁▃▁▃▁▁▁▁▁▃▁▁▁█▆█▁▁▁▁▁▁▁▁▁▃▁▁▆█▅█▆▆▇▆▅▅▆▆▆▇ █
  18.8 ns      Histogram: log(frequency) by time      29.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Next

I will go saving time with a trial to implement a specialized version of Tables.eachcolumns to strip some time from calls to ArchGDAL.getfield

mathieu17g commented 2 years ago

Next

I will go saving time with a trial to implement a specialized version of Tables.eachcolumns to strip some time from calls to ArchGDAL.getfield

I have copied and specialized Tables.jl/src/fallback.jl and Tables.jl/src/utils.jl. I had to use GeneralizedGenerated.jl to solve a world age problem with generated functions And now we are closing in on Shapefile.jl performance for layer to table conversion

julia> display(@benchmark DataFrame(AG.getFDPlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30)
       display(@benchmark DataFrame(Shapefile.Table("../../tmp/road/road.shp")) seconds = 30)
BenchmarkTools.Trial: 181 samples with 1 evaluation.
 Range (min … max):  146.779 ms … 251.964 ms  ┊ GC (min … max): 0.00% … 6.67%
 Time  (median):     157.170 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   165.837 ms ±  26.339 ms  ┊ GC (mean ± σ):  0.99% ± 2.04%

   ▂▇▅█▅▁▁ ▆▁▁                                                   
  ▅███████▇███▅▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▁▃▁▄▁▄▃▃▃▃ ▃
  147 ms           Histogram: frequency by time          251 ms <

 Memory estimate: 6.32 MiB, allocs estimate: 134898.
BenchmarkTools.Trial: 194 samples with 1 evaluation.
 Range (min … max):  146.609 ms … 164.070 ms  ┊ GC (min … max): 0.00% … 7.49%
 Time  (median):     152.686 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   154.683 ms ±   5.141 ms  ┊ GC (mean ± σ):  2.93% ± 3.05%

      ▂  ▄▃▂▅▃█▄▂▅                            ▄▂  ▇ ▃            
  ▃▁▅▁█▆▁█████████▇▆▅▆▃▇▅▁▁▁▁▁▁▁▅▁▁▁▁▃▃▅▅▆██████▇▇█▆█▆▃█▅▅▃▃▃▁▃ ▃
  147 ms           Histogram: frequency by time          164 ms <

 Memory estimate: 36.71 MiB, allocs estimate: 243065.
mathieu17g commented 2 years ago

@visr you mentioned in comment https://github.com/yeesian/ArchGDAL.jl/pull/238#issuecomment-928958407 that you had a case where layer to table conversion took a longtime

@mathieu17g we can discuss copies and such in a different issue. I still wanted to create one, because I saw that converting a layer to DataFrame for a large vector file took minutes.

Could you hand me the vector file example, to test it with this PR ?

visr commented 2 years ago

I believe in that case I was reading the HydroATLAS / BasinATLAS data. This is a direct link to the download: https://figshare.com/articles/dataset/HydroATLAS_version_1_0/9890531?file=20087237

One nice property of this dataset is that is has detail levels 1 to 12, which gradually get larger. Especially the higher levels started to get really slow I noticed.

mathieu17g commented 2 years ago

@visr Ok thanks. I will test the PR with this file set

mathieu17g commented 2 years ago

@visr I have reimplemented Tables.columns to avoid the long compilation time for cases like with HydroATLAS data

Results

1) Faster than with Shapefile uncompiled or compiled, starting from level 6 2) Nevermore than x2 slower uncompiled compared to Shapefile. The worst case being level 1 : 4s vs 2s on my mac (8 years old), which is still reasonable compared to current ArchGDAL master which takes around 150s for level 1+

Performance with HydroATLAS basin data level 6

Shapefile.jl

Uncompiled Shapefile:    7.719764 seconds (10.36 M allocations: 792.434 MiB, 6.68% gc time)
Compiled Shapefile:    BenchmarkTools.Trial: 50 samples with 1 evaluation.
 Range (min … max):  1.790 s …   2.263 s  ┊ GC (min … max): 1.03% … 10.02%
 Time  (median):     1.979 s              ┊ GC (median):    0.92%
 Time  (mean ± σ):   1.999 s ± 94.965 ms  ┊ GC (mean ± σ):  1.11% ±  1.29%

                     ▁▃▁█          ▁ ▃ ▁                     
  ▇▁▁▄▁▁▄▁▁▄▄▄▄▁▁▁▄▄▁████▄▄▁▄▄▄▁▁▄▄█▇█▄█▄▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  1.79 s         Histogram: frequency by time        2.26 s <

 Memory estimate: 501.84 MiB, allocs estimate: 5446511.

ArchGDAL 0.7.4

Uncompiled AG 0.7.4 :  155.592979 seconds (69.32 M allocations: 3.783 GiB, 0.67% gc time, 93.75% compilation time)
Compiled AG 0.7.4 :    BenchmarkTools.Trial: 5 samples with 1 evaluation.
 Range (min … max):  9.593 s …    9.887 s  ┊ GC (min … max): 2.33% … 2.87%
 Time  (median):     9.750 s               ┊ GC (median):    2.73%
 Time  (mean ± σ):   9.739 s ± 124.606 ms  ┊ GC (mean ± σ):  2.68% ± 0.22%

  █       █                      █              █          █  
  █▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁█ ▁
  9.59 s         Histogram: frequency by time         9.89 s <

 Memory estimate: 443.71 MiB, allocs estimate: 10380858.

FeatureLayer with stealgeom and new (Tables.)columns

Uncompiled AG normal:    3.652381 seconds (11.98 M allocations: 291.840 MiB, 1.87% gc time, 38.22% compilation time)
Compiled AG normal:    BenchmarkTools.Trial: 50 samples with 1 evaluation.
 Range (min … max):  2.418 s …    7.823 s  ┊ GC (min … max): 7.82% … 6.07%
 Time  (median):     2.496 s               ┊ GC (median):    7.93%
 Time  (mean ± σ):   2.690 s ± 812.183 ms  ┊ GC (mean ± σ):  7.66% ± 0.72%

  █▄▄                                                         
  ███▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
  2.42 s       Histogram: log(frequency) by time      7.82 s <

 Memory estimate: 179.96 MiB, allocs estimate: 10161324.

FDP_FeatureLayer with (Tables.)columns using plain @generated read function

Uncompiled AG FDP pg:    5.266566 seconds (11.17 M allocations: 492.128 MiB, 6.47% gc time, 3.22% compilation time)
Compiled AG FDP pg:    BenchmarkTools.Trial: 50 samples with 1 evaluation.
 Range (min … max):  1.458 s …    3.087 s  ┊ GC (min … max): 1.19% … 0.74%
 Time  (median):     1.634 s               ┊ GC (median):    1.31%
 Time  (mean ± σ):   1.694 s ± 244.873 ms  ┊ GC (mean ± σ):  1.28% ± 0.13%

       █▄ ▂                                                   
  ▅▁▁▇███▅█▆▆▅▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  1.46 s         Histogram: frequency by time         3.09 s <

 Memory estimate: 148.99 MiB, allocs estimate: 5379084.

Making of

I have kept all the Table.columns reimplementation trials in the code of the commit below for archive in case we have to come back to it later

I have dropped the use of immutable Tuples and the line by line column types widening in favor the use of mutable Vectors with wide types and post parsing for type narrowing as sketched for the schema build in issue #223

For the feature reading, I have tried to reimplement eachcolumns from Tables.jl/src/utils.jl with:

But using plain generated function and get column turned out to be faster both uncompiled and compiled. It is counter intuitive, and maybe my use of GeneralizedGenerated.jl and RuntimeGeneratedFunction.jl could be enhanced (reason why I left the code in the commit)

Results for bigger HydroATLAS file: level 12

Shapefile 1st run:  173.921930 seconds (347.88 M allocations: 20.671 GiB, 11.56% gc time)
Shapefile 2nd run:  190.139144 seconds (342.95 M allocations: 20.386 GiB, 17.76% gc time)

AG FDP pg 1st run:   98.029999 seconds (353.07 M allocations: 9.472 GiB, 8.89% gc time, 0.19% compilation time)
AG FDP pg 2nd run:   93.194404 seconds (348.00 M allocations: 9.178 GiB, 8.46% gc time, 0.02% compilation time)

Results for road.shp (used in previous commits/comments)

Uncompiled Shapefile:    2.908215 seconds (5.19 M allocations: 328.995 MiB, 11.32% gc time)
Compiled Shapefile:    BenchmarkTools.Trial: 193 samples with 1 evaluation.
 Range (min … max):  146.924 ms … 183.801 ms  ┊ GC (min … max): 0.00% … 5.49%
 Time  (median):     156.321 ms               ┊ GC (median):    4.59%
 Time  (mean ± σ):   155.529 ms ±   5.481 ms  ┊ GC (mean ± σ):  2.91% ± 2.76%

       ▄ ▅▂▄▅▂▂ ▂                  ▄▇ █▄▇                        
  ▅▃▅▅▅██████████▆▆▅▃▅▁▃▃▆▁▅▅▆▃██▆▅██▅███▅▆██▅▆▅▅▅▆▃▁▃▃▅▁▁█▁▃▃▃ ▃
  147 ms           Histogram: frequency by time          167 ms <

 Memory estimate: 36.71 MiB, allocs estimate: 243067.

Uncompiled AG FDP pg:    2.505719 seconds (5.13 M allocations: 294.994 MiB, 3.47% gc time, 7.39% compilation time)
Compiled AG FDP pg:    BenchmarkTools.Trial: 161 samples with 1 evaluation.
 Range (min … max):  166.438 ms … 263.748 ms  ┊ GC (min … max): 0.00% … 7.81%
 Time  (median):     178.569 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   187.196 ms ±  25.490 ms  ┊ GC (mean ± σ):  1.39% ± 2.57%

   █▃▅▂▇  ▄▃  ▂                                                  
  ▆██████████▆█▆▄▄▃▃▁▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▄▃▃▄▁▅▃▁▁▃▃▃▁▄▃▃▁▃▁▅ ▃
  166 ms           Histogram: frequency by time          260 ms <

 Memory estimate: 8.17 MiB, allocs estimate: 292907.
visr commented 2 years ago

Great that you were able to improve the performance so much! I don't really have experience with generated / RuntimeGeneratedFunctions / GeneralizedGenerated, so it's hard for me to comment on it, other than that it's probably nice if we don't have to add a dependency. Do any of the limitations of generated functions from Base cause issues, like the world-age issues or closures? What was the reason to look into those packages?

mathieu17g commented 2 years ago

Do any of the limitations of generated functions from Base cause issues, like the world-age issues or closures? What was the reason to look into those packages?

I had the world age issue when trying to call directly asXXX functions.

But the overhead of calling getindex on a vector of asXXX functions at runtime appears to be mitigated somehow by the use of Base.@generated compared to using a FeatureDefn tailored vector function (field1, ..., fieldn) -> (astypeof field1(field1), ..., astypeoffieldn(fieldn)) built with @RuntimeGeneratedFunction.

So the use of Base.@generated functions is faster and simpler. I think it’s good enough.

I’ll make a PR on master for the new Tables.columns for AbstractFeatureLayer

mathieu17g commented 2 years ago

Repairing Tables methods unit testing, I had to revert from stealgeom to getgeom.

The impact on performance is limited for HydroATLAS basin test files and French main roads test file (see tests results below). FDP layer to table conversion is on par or faster (for big files) compared to Shapefile.jl

stealgeom could still be interesting:

AG FDP pg 1st run: 4.993936 seconds (10.46 M allocations: 450.400 MiB, 2.49% gc time, 3.47% compilation time) Compiled AG FDP pg: BenchmarkTools.Trial: 17 samples with 1 evaluation. Range (min … max): 1.553 s … 1.979 s ┊ GC (min … max): 1.75% … 1.98% Time (median): 1.839 s ┊ GC (median): 1.91% Time (mean ± σ): 1.813 s ± 132.176 ms ┊ GC (mean ± σ): 1.95% ± 0.71%

█ ▁ ▁ ▁ █▁▁ █ ▁ ▁ ▁ ▁ █
█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁███▁█▁▁▁█▁█▁▁█▁█▁▁▁▁▁█ ▁ 1.55 s Histogram: frequency by time 1.98 s <

Memory estimate: 148.99 MiB, allocs estimate: 5379083.

#### French main roads file
```julia
Shapefile 1st run:    2.446287 seconds (5.19 M allocations: 329.013 MiB, 5.75% gc time)
Compiled Shapefile: BenchmarkTools.Trial: 192 samples with 1 evaluation.
 Range (min … max):  147.897 ms … 170.355 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     157.005 ms               ┊ GC (median):    4.41%
 Time  (mean ± σ):   156.399 ms ±   4.845 ms  ┊ GC (mean ± σ):  2.94% ± 2.80%

        ▂ ▁ ▂  ▁▇█▁   ▂ ▁           ▁ ▂      ▄▄▂▁▂  ▅            
  ▃▃▃▃▃▃█▅█▆█████████▃███▁▅▁▁▅▁▁▃▁▅▃█▅█▆▅▃█▆██████▃██▁█▆▃▅▅▅▆▁▅ ▃
  148 ms           Histogram: frequency by time          165 ms <

 Memory estimate: 36.71 MiB, allocs estimate: 243067.

AG FDP pg 1st run:    2.479176 seconds (5.13 M allocations: 295.079 MiB, 4.28% gc time, 8.29% compilation time)
Compiled AG FDP pg: BenchmarkTools.Trial: 153 samples with 1 evaluation.
 Range (min … max):  176.988 ms … 274.031 ms  ┊ GC (min … max): 0.00% … 7.41%
 Time  (median):     187.198 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   196.996 ms ±  24.662 ms  ┊ GC (mean ± σ):  1.46% ± 2.73%

    ▁█▇ ▂  ▁                                                     
  ▃▄█████▇▇█▄▅▅▇▃▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▃▃▃▃▃▃▃▃▃▃▃▁▃▃ ▃
  177 ms           Histogram: frequency by time          264 ms <

 Memory estimate: 8.17 MiB, allocs estimate: 292898.
yeesian commented 2 years ago

I'm really sorry, but I'm going through a period of bereavement due to a personal loss, and nominate @visr to review this PR in my absence. If he might be unavailable to do so, I'll review it when I'm in the appropriate state to do so.

(At a high level, performance matters to me: I'm partial to both this PR and https://github.com/yeesian/ArchGDAL.jl/pull/243 and will allow for them to introduce breaking/experimental changes since it's still pre-1.0 and might give some directional information/guidance on workflows and APIs that enable performance gains. I'd review with an eye towards complexity and maintainability, but my preferences are ultimately non-blocking and I'm willing to postpone them until after @mathieu17g has explored the space of generated functions and unexploited GDAL functions that are important for optimization and might have implications for the API.)

visr commented 2 years ago

Really sorry to hear that @yeesian. Of course take all the time you need, I wish you strength during a difficult time.

I can probably help out a bit here with reviewing. As it is right now I don't fully understand the PR, but that also has to do with my limited experience with generated functions and since I didn't properly dive into the code yet. @mathieu17g hopefully when you come out of the (impressive) rabbit hole here the result will be sufficiently simple.

mathieu17g commented 2 years ago

@yeesian, I'm so sorry for your loss. Thank you for having taken the time to share your views with us in those difficult times.

As of this PR and @generated functions, they are quite adapted to the GDAL transposition to Julia for fields manipulation. They fill the performance gap with what a C compiler would optimize on field manipulations when doing something like building a dataframe. It remains to be seen, if there is any gain outside this use case. If not we could end up confining the new parameterized types and the associated generated functions to the use case of direct conversion from geofiles to table source.

mathieu17g commented 2 years ago

@visr I have:

Could you please have a look to it, and test its performance on your own. BenchmarkCI judgement gives results a bit different from the ones I get locally, without any order change nonetheless. Below a copy of commit 2ca3a4c results:

                                                                          ID            time GC time         memory allocations
  –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––––––– ––––––– –––––––––––––– –––––––––––
                              ["shapefile_to_table", "frenchroads_ArchGDAL"] 285.500 ms (5%)          8.43 MiB (1%)      373899
         ["shapefile_to_table", "frenchroads_ArchGDAL_new_Tables_interface"] 238.089 ms (5%)          7.19 MiB (1%)      279339
  ["shapefile_to_table", "frenchroads_ArchGDAL_new_Tables_interface_vsizip"] 366.701 ms (5%)          7.20 MiB (1%)      279381
                       ["shapefile_to_table", "frenchroads_ArchGDAL_vsizip"] 412.876 ms (5%)          8.43 MiB (1%)      373941
                             ["shapefile_to_table", "frenchroads_Shapefile"] 187.883 ms (5%)         35.39 MiB (1%)      243030
visr commented 2 years ago

@mathieu17g I'm a little bit lost. It would be of great help to me if you could write a high level summary of (1) the issue that this PR is trying to fix, and (2) how the proposed changes help fix this issue. I have only a rough understanding about ArchGDAL Tables performance issues. If there is a simple change that solves the largest performance bottleneck, but leaves another 10% on the table, that may well be preferred.

I see you introduce new parametrized structs besides the original unparametrized ones. Is that for now to isolate the changes, or is there a reason we'd have to keep both around?

mathieu17g commented 2 years ago

@visr sure, I will make a summary

mathieu17g commented 2 years ago

@visr this PR tries to solve two issues and introduces some more limited enhancements:

  1. yeesian/ArchGDAL.jl/issues/246 by dispatching getfield on types: x2.5 speedup but complex parametric types are introduced
  2. your unreferenced issue concerning the first run performance on ArchGDAL table interface for HydroATLAS basin shapefiles: x50 speedup
  3. other enhancements:
    1. Geometry stealing instead of cloning: from x1.5 to x500 speedup
    2. Draft of dependency between GDAL OGR objects mapping in ArchGDAL, sketched on new parametric types
    3. Simplification of OGRwkbGeometryType conversions between GDAL and ArchGDAL, used in ArchGDAL Geometry and IGeometry constructors: x30 speedup

The parametric types introduced to solve the 1st issue can be:

  1. extended to replace current ArchGDAL OGR related types
  2. restricted to the table interface implementation and hidden within a Table struct

Since getfield is mainly used in the ArchGDAL table interface where we read and convert all fields of all features of a feature layer, the performance enhancement is mainly felt on the table interface. So option 2 allows to limit the ArchGDAL overall code complexity. The replacement of ArchGDAL OGR related types could be considered later should a new benefit arise. But if both @visr and @yeesian are confortable with this parametric types, we can go for a replacement now.

Follow-ups besides the parametric types handling:

1. getfield dispatch on field types

Conclusion:

2. Table interface performance enhancement

@visr you raised the issue that with ArchGDAL ≤ v0.8.0, which uses Tables.jl fallback function Tables.columns with nothing as Schema, the reading of files with many fields was taking "minutes".
Your case takes on my mac: ~150s at the first run vs ~3s at the second

ArchGDAL v0.8.0 table interface uncompiled performance on HydroATLAS basin shapefile level 1

```julia julia> using Pkg; Pkg.precompile(); Pkg.status("ArchGDAL") import ArchGDAL as AG using DataFrames; using BenchmarkTools; using Printf level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp" @time DataFrame(AG.getlayer(AG.read(file), 0)) Status `~/.../Project.toml` [c9ce4bd3] ArchGDAL v0.8.0 `dev/ArchGDAL` 156.487261 seconds (55.14 M allocations: 3.081 GiB, 0.52% gc time, 99.42% compilation time) 10×295 DataFrame Row │ HYBAS_ID NEXT_DOWN NEXT_SINK MAIN_BAS DIST_SINK DIST_MAIN SUB_AREA UP_AREA PFAF_ID ⋯ │ IGeometr… Int64 Int64 Int64 Int64 Float64 Float64 Float64 Float64 Int64 ⋯ ─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ Geometry: wkbMultiPolygon 1010000010 0 1010000010 1010000010 0.0 0.0 2.99531e7 2.99531e7 1 ⋯ 2 │ Geometry: wkbMultiPolygon 2010000010 0 2010000010 2010000010 0.0 0.0 1.78589e7 1.78589e7 2 3 │ Geometry: wkbMultiPolygon 3010000010 0 3010000010 3010000010 0.0 0.0 1.29491e7 1.29491e7 3 4 │ Geometry: wkbMultiPolygon 4010000010 0 4010000010 4010000010 0.0 0.0 2.08274e7 2.08274e7 4 5 │ Geometry: wkbMultiPolygon 5010000010 0 5010000010 5010000010 0.0 0.0 1.08038e7 1.08038e7 5 ⋯ 6 │ Geometry: wkbMultiPolygon 6010000010 0 6010000010 6010000010 0.0 0.0 1.78535e7 1.78535e7 6 7 │ Geometry: wkbMultiPolygon 7010000010 0 7010000010 7010000010 0.0 0.0 1.59146e7 1.59146e7 7 8 │ Geometry: wkbMultiPolygon 8010000010 0 8010000010 8010000010 0.0 0.0 6.19709e6 6.19709e6 8 9 │ Geometry: wkbMultiPolygon 8010020760 0 8010020760 8010020760 0.0 0.0 1.17002e5 1.17002e5 3 ⋯ 10 │ Geometry: wkbMultiPolygon 9010000010 0 9010000010 9010000010 0.0 0.0 2.14672e6 2.14672e6 9 285 columns omitted ```

Raising the SPECIALIZATION_THRESHOLD and related values in Tables.jl/src/utils.jl (in @nexprs) above the number of fields (400 tried for the basin shapefile level 1 which has around 300 fields), does not bring any performance enhancement

ArchGDAL v0.7.4 table interface uncompiled performance on HydroATLAS basin shapefile level 1, with raised to 400 SPECIALIZATION_THRESHOLD and related values in Tables.jl v1.6.1

```julia julia> using Pkg; Pkg.update(); Pkg.status("ArchGDAL") import ArchGDAL as AG using DataFrames; using BenchmarkTools; using Printf level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp" @time DataFrame(AG.getlayer(AG.read(file), 0)) Updating registry at `~/.julia/registries/General` Updating git-repo `https://github.com/JuliaRegistries/General.git` No Changes to `~/.../Project.toml` No Changes to `~/.../Manifest.toml` Precompiling project... 18 dependencies successfully precompiled in 49 seconds (242 already precompiled) Status `~/.../Project.toml` [c9ce4bd3] ArchGDAL v0.7.4 `dev/ArchGDAL` 159.786486 seconds (61.80 M allocations: 3.383 GiB, 0.56% gc time, 99.45% compilation time) 10×295 DataFrame Row │ HYBAS_ID NEXT_DOWN NEXT_SINK MAIN_BAS DIST_SINK DIST_MAIN SUB_AREA UP_AREA P ⋯ │ IGeometr… Int64 Int64 Int64 Int64 Float64 Float64 Float64 Float64 I ⋯ ─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ Geometry: wkbMultiPolygon 1010000010 0 1010000010 1010000010 0.0 0.0 2.99531e7 2.99531e7 ⋯ 2 │ Geometry: wkbMultiPolygon 2010000010 0 2010000010 2010000010 0.0 0.0 1.78589e7 1.78589e7 3 │ Geometry: wkbMultiPolygon 3010000010 0 3010000010 3010000010 0.0 0.0 1.29491e7 1.29491e7 4 │ Geometry: wkbMultiPolygon 4010000010 0 4010000010 4010000010 0.0 0.0 2.08274e7 2.08274e7 5 │ Geometry: wkbMultiPolygon 5010000010 0 5010000010 5010000010 0.0 0.0 1.08038e7 1.08038e7 ⋯ 6 │ Geometry: wkbMultiPolygon 6010000010 0 6010000010 6010000010 0.0 0.0 1.78535e7 1.78535e7 7 │ Geometry: wkbMultiPolygon 7010000010 0 7010000010 7010000010 0.0 0.0 1.59146e7 1.59146e7 8 │ Geometry: wkbMultiPolygon 8010000010 0 8010000010 8010000010 0.0 0.0 6.19709e6 6.19709e6 9 │ Geometry: wkbMultiPolygon 8010020760 0 8010020760 8010020760 0.0 0.0 1.17002e5 1.17002e5 ⋯ 10 │ Geometry: wkbMultiPolygon 9010000010 0 9010000010 9010000010 0.0 0.0 2.14672e6 2.14672e6 286 columns omitted ```

The performance bottleneck at first run is located in Tables.add_or_widen! (seen when profiling). Probably around the columm allocation machinery based on Tuple
PROBABLY NEEDS AN ISSUE TO BE RAISED IN TABLES.JL
If we force the usage of Tables.add! instead of Tables.add_or_widen!, by giving the Tables.Schema to Tables.buildcolumns in Tables.columns, the performance issue disappear (see below): 3.22s intead of ~150s on my mac

ArchGDAL v0.7.4 table interface uncompiled performance on HydroATLAS basin shapefile level 1, with a detailed schema specified

```julia julia> using Pkg; Pkg.update(); Pkg.status("ArchGDAL") import ArchGDAL as AG using DataFrames; using BenchmarkTools; using Printf; using Tables level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp" sch = Tables.Schema((Symbol(""), :HYBAS_ID, :NEXT_DOWN, :NEXT_SINK, :MAIN_BAS, :DIST_SINK, :DIST_MAIN, :SUB_AREA, :UP_AREA, :PFAF_ID, :ENDO, :COAST, :ORDER_, :SORT, :dis_m3_pyr, :dis_m3_pmn, :dis_m3_pmx, :run_mm_syr, :inu_pc_smn, :inu_pc_umn, :inu_pc_smx, :inu_pc_umx, :inu_pc_slt, :inu_pc_ult, :lka_pc_sse, :lka_pc_use, :lkv_mc_usu, :rev_mc_usu, :dor_pc_pva, :ria_ha_ssu, :ria_ha_usu, :riv_tc_ssu, :riv_tc_usu, :gwt_cm_sav, :ele_mt_sav, :ele_mt_uav, :ele_mt_smn, :ele_mt_smx, :slp_dg_sav, :slp_dg_uav, :sgr_dk_sav, :clz_cl_smj, :cls_cl_smj, :tmp_dc_syr, :tmp_dc_uyr, :tmp_dc_smn, :tmp_dc_smx, :tmp_dc_s01, :tmp_dc_s02, :tmp_dc_s03, :tmp_dc_s04, :tmp_dc_s05, :tmp_dc_s06, :tmp_dc_s07, :tmp_dc_s08, :tmp_dc_s09, :tmp_dc_s10, :tmp_dc_s11, :tmp_dc_s12, :pre_mm_syr, :pre_mm_uyr, :pre_mm_s01, :pre_mm_s02, :pre_mm_s03, :pre_mm_s04, :pre_mm_s05, :pre_mm_s06, :pre_mm_s07, :pre_mm_s08, :pre_mm_s09, :pre_mm_s10, :pre_mm_s11, :pre_mm_s12, :pet_mm_syr, :pet_mm_uyr, :pet_mm_s01, :pet_mm_s02, :pet_mm_s03, :pet_mm_s04, :pet_mm_s05, :pet_mm_s06, :pet_mm_s07, :pet_mm_s08, :pet_mm_s09, :pet_mm_s10, :pet_mm_s11, :pet_mm_s12, :aet_mm_syr, :aet_mm_uyr, :aet_mm_s01, :aet_mm_s02, :aet_mm_s03, :aet_mm_s04, :aet_mm_s05, :aet_mm_s06, :aet_mm_s07, :aet_mm_s08, :aet_mm_s09, :aet_mm_s10, :aet_mm_s11, :aet_mm_s12, :ari_ix_sav, :ari_ix_uav, :cmi_ix_syr, :cmi_ix_uyr, :cmi_ix_s01, :cmi_ix_s02, :cmi_ix_s03, :cmi_ix_s04, :cmi_ix_s05, :cmi_ix_s06, :cmi_ix_s07, :cmi_ix_s08, :cmi_ix_s09, :cmi_ix_s10, :cmi_ix_s11, :cmi_ix_s12, :snw_pc_syr, :snw_pc_uyr, :snw_pc_smx, :snw_pc_s01, :snw_pc_s02, :snw_pc_s03, :snw_pc_s04, :snw_pc_s05, :snw_pc_s06, :snw_pc_s07, :snw_pc_s08, :snw_pc_s09, :snw_pc_s10, :snw_pc_s11, :snw_pc_s12, :glc_cl_smj, :glc_pc_s01, :glc_pc_s02, :glc_pc_s03, :glc_pc_s04, :glc_pc_s05, :glc_pc_s06, :glc_pc_s07, :glc_pc_s08, :glc_pc_s09, :glc_pc_s10, :glc_pc_s11, :glc_pc_s12, :glc_pc_s13, :glc_pc_s14, :glc_pc_s15, :glc_pc_s16, :glc_pc_s17, :glc_pc_s18, :glc_pc_s19, :glc_pc_s20, :glc_pc_s21, :glc_pc_s22, :glc_pc_u01, :glc_pc_u02, :glc_pc_u03, :glc_pc_u04, :glc_pc_u05, :glc_pc_u06, :glc_pc_u07, :glc_pc_u08, :glc_pc_u09, :glc_pc_u10, :glc_pc_u11, :glc_pc_u12, :glc_pc_u13, :glc_pc_u14, :glc_pc_u15, :glc_pc_u16, :glc_pc_u17, :glc_pc_u18, :glc_pc_u19, :glc_pc_u20, :glc_pc_u21, :glc_pc_u22, :pnv_cl_smj, :pnv_pc_s01, :pnv_pc_s02, :pnv_pc_s03, :pnv_pc_s04, :pnv_pc_s05, :pnv_pc_s06, :pnv_pc_s07, :pnv_pc_s08, :pnv_pc_s09, :pnv_pc_s10, :pnv_pc_s11, :pnv_pc_s12, :pnv_pc_s13, :pnv_pc_s14, :pnv_pc_s15, :pnv_pc_u01, :pnv_pc_u02, :pnv_pc_u03, :pnv_pc_u04, :pnv_pc_u05, :pnv_pc_u06, :pnv_pc_u07, :pnv_pc_u08, :pnv_pc_u09, :pnv_pc_u10, :pnv_pc_u11, :pnv_pc_u12, :pnv_pc_u13, :pnv_pc_u14, :pnv_pc_u15, :wet_cl_smj, :wet_pc_sg1, :wet_pc_ug1, :wet_pc_sg2, :wet_pc_ug2, :wet_pc_s01, :wet_pc_s02, :wet_pc_s03, :wet_pc_s04, :wet_pc_s05, :wet_pc_s06, :wet_pc_s07, :wet_pc_s08, :wet_pc_s09, :wet_pc_u01, :wet_pc_u02, :wet_pc_u03, :wet_pc_u04, :wet_pc_u05, :wet_pc_u06, :wet_pc_u07, :wet_pc_u08, :wet_pc_u09, :for_pc_sse, :for_pc_use, :crp_pc_sse, :crp_pc_use, :pst_pc_sse, :pst_pc_use, :ire_pc_sse, :ire_pc_use, :gla_pc_sse, :gla_pc_use, :prm_pc_sse, :prm_pc_use, :pac_pc_sse, :pac_pc_use, :tbi_cl_smj, :tec_cl_smj, :fmh_cl_smj, :fec_cl_smj, :cly_pc_sav, :cly_pc_uav, :slt_pc_sav, :slt_pc_uav, :snd_pc_sav, :snd_pc_uav, :soc_th_sav, :soc_th_uav, :swc_pc_syr, :swc_pc_uyr, :swc_pc_s01, :swc_pc_s02, :swc_pc_s03, :swc_pc_s04, :swc_pc_s05, :swc_pc_s06, :swc_pc_s07, :swc_pc_s08, :swc_pc_s09, :swc_pc_s10, :swc_pc_s11, :swc_pc_s12, :lit_cl_smj, :kar_pc_sse, :kar_pc_use, :ero_kh_sav, :ero_kh_uav, :pop_ct_ssu, :pop_ct_usu, :ppd_pk_sav, :ppd_pk_uav, :urb_pc_sse, :urb_pc_use, :nli_ix_sav, :nli_ix_uav, :rdd_mk_sav, :rdd_mk_uav, :hft_ix_s93, :hft_ix_u93, :hft_ix_s09, :hft_ix_u09, :gad_id_smj, :gdp_ud_sav, :gdp_ud_ssu, :gdp_ud_usu, :hdi_ix_sav),(AG.IGeometry{AG.wkbMultiPolygon}, Int64, Int64, Int64, Int64, Float64, Float64, Float64, Float64, Int64, Int32, Int32, Int32, Int32, Float64, Float64, Float64, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Float64, Float64, Float64, Float64, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Float64, Float64, Float64, Float64, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int64, Int64, Int32)) @time DataFrame(Tables.CopiedColumns(Tables.buildcolumns(sch, AG.getlayer(AG.read(file), 0)))) Updating registry at `~/.julia/registries/General` Updating git-repo `https://github.com/JuliaRegistries/General.git` No Changes to `~/.../Project.toml` No Changes to `~/.../Manifest.toml` Status `~/.../Project.toml` [c9ce4bd3] ArchGDAL v0.7.4 `dev/ArchGDAL` 3.229805 seconds (4.18 M allocations: 216.228 MiB, 13.59% gc time, 75.90% compilation time) 10×295 DataFrame Row │ HYBAS_ID NEXT_DOWN NEXT_SINK MAIN_BAS DIST_SINK DIST_MAIN SUB_AREA ⋯ │ IGeometr… Int64 Int64 Int64 Int64 Float64 Float64 Float64 ⋯ ─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ Geometry: wkbMultiPolygon 1010000010 0 1010000010 1010000010 0.0 0.0 2.99531e7 ⋯ 2 │ Geometry: wkbMultiPolygon 2010000010 0 2010000010 2010000010 0.0 0.0 1.78589e7 3 │ Geometry: wkbMultiPolygon 3010000010 0 3010000010 3010000010 0.0 0.0 1.29491e7 4 │ Geometry: wkbMultiPolygon 4010000010 0 4010000010 4010000010 0.0 0.0 2.08274e7 5 │ Geometry: wkbMultiPolygon 5010000010 0 5010000010 5010000010 0.0 0.0 1.08038e7 ⋯ 6 │ Geometry: wkbMultiPolygon 6010000010 0 6010000010 6010000010 0.0 0.0 1.78535e7 7 │ Geometry: wkbMultiPolygon 7010000010 0 7010000010 7010000010 0.0 0.0 1.59146e7 8 │ Geometry: wkbMultiPolygon 8010000010 0 8010000010 8010000010 0.0 0.0 6.19709e6 9 │ Geometry: wkbMultiPolygon 8010020760 0 8010020760 8010020760 0.0 0.0 1.17002e5 ⋯ 10 │ Geometry: wkbMultiPolygon 9010000010 0 9010000010 9010000010 0.0 0.0 2.14672e6 287 columns omitted ```

Unfortunately, we cannot know the tightest schema for sure from a layer feature definition (see the case of Shapefile driver with LineString and MultiLineString). Therefore we would have to provide a generic looser schema using:

and tighten it after the layer has been read. But this feature is not provided by Tables.jl. Therefore we have to implement Tables.columns as mentionned in my fist issue on ArchGDAL :-)

This leads to implementing on ArchGDAL.jl side, at least to handle the missing geometry case, something faster than buildcolumns(::Nothing, rowitr::T) where {T} -> _buildcolumns -> eachcolumns / add_or_widen! -> __buildcolumns. I doubt that it could be done easily.

Tables.columns(::AbstractFeatureLayer) is implemented in this PR in src/tables.jl. Here is what it does:

With this function the uncompiled performance is back to an execution time similar to the one of Tables.jl fallback Tables.columns with a specified schema: 3.37s on my mac

ArchGDAL table interface uncompiled performance on HydroATLAS basin shapefile level 1, with its own Tables.columns function implementation

```julia julia> using Pkg; Pkg.update(); Pkg.status("ArchGDAL") import ArchGDAL as AG using DataFrames; using BenchmarkTools; using Printf level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp" @time DataFrame(AG.getlayer(AG.read(file), 0)) Updating registry at `~/.julia/registries/General` Updating git-repo `https://github.com/JuliaRegistries/General.git` Updating `~/.../Project.toml` [c9ce4bd3] ~ ArchGDAL v0.7.4 `dev/ArchGDAL` ⇒ v0.8.0 `dev/ArchGDAL` Updating `~/.../Manifest.toml` [c9ce4bd3] ~ ArchGDAL v0.7.4 `dev/ArchGDAL` ⇒ v0.8.0 `dev/ArchGDAL` Precompiling project... 1 dependency successfully precompiled in 11 seconds (259 already precompiled) Status `~/.../Project.toml` [c9ce4bd3] ArchGDAL v0.8.0 `dev/ArchGDAL` 3.370118 seconds (5.11 M allocations: 281.298 MiB, 3.18% gc time, 76.13% compilation time) 10×295 DataFrame Row │ HYBAS_ID NEXT_DOWN NEXT_SINK MAIN_BAS DIST_SINK DIST_MAIN SUB_AREA UP ⋯ │ IGeometr… Int64 Int64 Int64 Int64 Float64 Float64 Float64 Fl ⋯ ─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ Geometry: wkbMultiPolygon 1010000010 0 1010000010 1010000010 0.0 0.0 2.99531e7 2. ⋯ 2 │ Geometry: wkbMultiPolygon 2010000010 0 2010000010 2010000010 0.0 0.0 1.78589e7 1. 3 │ Geometry: wkbMultiPolygon 3010000010 0 3010000010 3010000010 0.0 0.0 1.29491e7 1. 4 │ Geometry: wkbMultiPolygon 4010000010 0 4010000010 4010000010 0.0 0.0 2.08274e7 2. 5 │ Geometry: wkbMultiPolygon 5010000010 0 5010000010 5010000010 0.0 0.0 1.08038e7 1. ⋯ 6 │ Geometry: wkbMultiPolygon 6010000010 0 6010000010 6010000010 0.0 0.0 1.78535e7 1. 7 │ Geometry: wkbMultiPolygon 7010000010 0 7010000010 7010000010 0.0 0.0 1.59146e7 1. 8 │ Geometry: wkbMultiPolygon 8010000010 0 8010000010 8010000010 0.0 0.0 6.19709e6 6. 9 │ Geometry: wkbMultiPolygon 8010020760 0 8010020760 8010020760 0.0 0.0 1.17002e5 1. ⋯ 10 │ Geometry: wkbMultiPolygon 9010000010 0 9010000010 9010000010 0.0 0.0 2.14672e6 2. 287 columns omitted ```

Conclusion: Performance issue at first run solved for files of similar to the HydroATLAS basin shapefiles (lots of fields): time is now on par with the standard Tables.jl fallback functions with full Tables.Schema provided

3. Other enhancements

3.1. Geometry stealing instead of cloning

When the source does not have to be preserved, stealing geometry is faster. The speedup increases with the geometry size. From x1.5 (Point) up to x500 (big MultiPolygon as in HydroATLAS basin shapefile level 1)

Geometry stealing vs cloning benchmark

```julia julia> import ArchGDAL as AG using GDAL, BenchmarkTools, Printf julia> file = "dev/ArchGDAL/test/data/point.geojson" layer = AG.getlayer(AG.read(file)) @btime AG.IGeometry(GDAL.ogr_g_clone(GDAL.ogr_f_getgeometryref(feature.ptr))) setup=(feature = iterate(AG.copy($layer))[1]) @btime AG.IGeometry(GDAL.ogr_f_stealgeometry(feature.ptr)) setup=(feature = iterate(AG.copy($layer))[1]) nothing 350.620 ns (3 allocations: 48 bytes) 211.821 ns (3 allocations: 48 bytes) julia> level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp" layer = AG.getlayer(AG.read(file)) @btime AG.IGeometry(GDAL.ogr_g_clone(GDAL.ogr_f_getgeometryref(feature.ptr))) setup=(feature = iterate(AG.copy($layer))[1]) @btime AG.IGeometry(GDAL.ogr_f_stealgeometry(feature.ptr)) setup=(feature = iterate(AG.copy($layer))[1]) nothing 676.252 μs (3 allocations: 48 bytes) 1.159 μs (3 allocations: 48 bytes) ```

Since the geometry left when stealing is a NULL geometry, stealgeom can only used when the source layer is transient (see example in src/tables2.jl on Table)

Unfortunately, this function is available only for the first geometry on GDAL C interface via OGR_F_StealGeometry(). An issue has been raised and accepted on GDAL side, waiting for a PR to be proposed

3.2. Dependency between GDAL OGR objects modeled in new parametric types

Below a trial for issue #215 being tackled by @yeesian OGR Julia objects ownership relations diagram V1

3.3. Simplification and speedup of convert for GDAL.OGRwkbGeometryType and ArchGDAL.OGRwkbGeometryType

I have aligned UInt32 values assigned in OGRwkbGeometryType enums between GDAL and ArchGDAL. This allows to avoid the @convert macro, and instead to call directly the enum's constructor on values for conversion
It brings a x30 speedup, propagated to _infergeomtype and the Geometry and IGeometry construtors

OGRwkbGeometryType conversion benchmark between GDAL and ArchGDAL

V0.8.0 `master` branch: ```julia julia> @btime convert(AG.OGRwkbGeometryType, GDAL.wkbMultiPolygon) @btime convert(GDAL.OGRwkbGeometryType, AG.wkbMultiPolygon) nothing 63.165 ns (0 allocations: 0 bytes) 63.103 ns (0 allocations: 0 bytes) ``` `Enhance_ArchGDAL_types` branch: ```julia julia> @btime convert(AG.OGRwkbGeometryType, GDAL.wkbMultiPolygon) @btime convert(GDAL.OGRwkbGeometryType, AG.wkbMultiPolygon) nothing 2.744 ns (0 allocations: 0 bytes) 0.025 ns (0 allocations: 0 bytes) ```

visr commented 2 years ago

Thanks a lot @mathieu17g for taking the time to write down all your findings! I'll get back to it when I have some more time.

mathieu17g commented 2 years ago

Hi @visr, did you had any time lately to look into this PR ? I would prefer that we decide on this one before finishing PR #243.

In the mean time, I proposed a PR in GDAL to be able to extend the geometry stealing performance gain beyond the the first geometry. It is still under review. Once merged, I will see to propagate it all the way to ArchGDAL.

visr commented 2 years ago

@mathieu17g thanks for the bump, and sorry I did not yet sit down to get the proper look that this work deserves. I hope to get to this soon.

I see https://github.com/OSGeo/gdal/pull/5196 landed, nice work there!

Regarding this bit of your writeup:

PROBABLY NEEDS AN ISSUE TO BE RAISED IN TABLES.JL

Perhaps it's a good idea to go ahead and raise that issue, with a reproducible example for them to test the performance? I've seen the same functions show up in my earlier profiles, but couldn't really figure out how to improve it.

mathieu17g commented 2 years ago

@visr I have proposed in https://github.com/JuliaData/Tables.jl/issues/273 solution to overcome the performance issue at first run. But the current PR implementation of Tables.columns will still be faster by a factor of x5 at first run and > x5 for subsequent runs (tested on HydroBASIN level 7)

visr commented 2 years ago

Awesome, I tried your inlining change from https://github.com/JuliaData/Tables.jl/issues/273#issuecomment-1037054582 as well.

On level 4 BasinATLAS I see similar results, around 100x speedup of compilation, and no real effect on runtime, only some less allocations and less time in gc. Noted this was just a simple repeated at-time exercise.

# level 4, 1st run on columntable: 178.628898 seconds (55.54 M allocations: 3.079 GiB, 0.44% gc time, 98.50% compilation time)
# level 4, 2nd run on columntable:   2.119344 seconds (912.34 k allocations: 222.733 MiB, 3.29% gc time)
# level 4, 3rd run on columntable:    2.293723 seconds (912.34 k allocations: 222.733 MiB, 2.50% gc time)
# level 4, 4th run on columntable:    2.434573 seconds (917.14 k allocations: 223.011 MiB, 6.13% gc time, 0.31% compilation time)
# level 4, 5th run on columntable:    2.177187 seconds (912.34 k allocations: 222.733 MiB, 3.34% gc time)
# level 4, 1st run on inlined columntable: 4.830689 seconds (4.07 M allocations: 196.590 MiB, 1.73% gc time, 49.77% compilation time)
# level 4, 2nd run on inlined columntable: 2.478992 seconds (827.06 k allocations: 22.226 MiB)
# level 4, 3rd run on inlined columntable: 2.294762 seconds (827.06 k allocations: 22.226 MiB)
# level 4, 4th run on inlined columntable: 2.247424 seconds (827.06 k allocations: 22.226 MiB)
# level 4, 5th run on inlined columntable: 2.424489 seconds (827.06 k allocations: 22.226 MiB)

Do you want to go ahead and submit that change as a PR to Tables.jl? You'll probably get more feedback that way. And if it lands, we get a free 100x in ArchGDAL!

Good to know that this PR would still improve the situation futher at least 5x. Though we'd have to weigh it against the added complexity of course, it's not a one line change that gives 100x. So let's start there?

mathieu17g commented 2 years ago

@visr

Do you want to go ahead and submit that change as a PR to Tables.jl? You'll probably get more feedback that way. And if it lands, we get a free 100x in ArchGDAL!

I have made the PR https://github.com/JuliaData/Tables.jl/pull/275 to solve the compilation performance issue https://github.com/JuliaData/Tables.jl/issues/273 and another one seen during the investigation on column ordering when using dictcolumntable https://github.com/JuliaData/Tables.jl/issues/274

visr commented 2 years ago

Great, thanks! Though perhaps better to split it into two separate PR's since the two changes are unrelated. The inlining change is straightforward. For the dicts I guess the main question is do they want to take on OrderedCollections.jl as a new dependency.

mathieu17g commented 2 years ago

Ok let's wait for some feedback from the @quinnj

quinnj commented 2 years ago

Sorry for the slowness; I've been slammed w/ non-julia things lately. I'll try to dig in tonight.

maxfreu commented 2 years ago

It seems like a lot of work has gone into this PR. What's the current state? This shouldn't go to waste :)

rafaqz commented 2 years ago

Some of this is superceded by https://github.com/yeesian/ArchGDAL.jl/pull/316

There the enum integers are used in the type directly. Its simple and works. But we could also swap them to real types, that would be a minor change (but no performance improvement).

I tried to use this PR but it contains a lot of unrelated ideas. It should be cleanly separated into separate PRs

maxfreu commented 2 years ago

Ok, to me it looks as if #316 will be easier to merge and comes first. @mathieu17g can you somehow separate the ideas you had into separate PRs, which are more digestible on their own and hence easier to merge? E.g. one for table interface improvement, one for geometry stealing, one for better convert functions (if they can be separated like that). This can maybe be done best after #316 has been merged. Or do you have any objections against merging #316 and arguments for going further with your version (which looks more complicated after a superficial read)?

yeesian commented 2 years ago

I think there's a lot of great ideas here (for not only improvements but also diagrams and benchmarking) that I'm happy to keep it as-is (without deleting or merging) and closing it only after we feel we've made progress on the items that we'd like to incorporate.