Open mathieu17g opened 2 years ago
- Include field and geomfield names in feature definition type
FDType
to avoid callingogr_f_getfieldindex
I have changed the FeatureDefn type parameter of ArchGDAL OGR objects from
FDType = Tuple{NTuple{NG,GType} where NG,NTuple{NF,FType} where NF}
which is the same as
FDType = Tuple{
Tuple{Vararg{GType, NG}} where NG,
Tuple{Vararg{FType, NF}} where NF,
}
to (thanks to https://discourse.julialang.org/t/parametric-type-with-a-constrained-namedtuple/34511)
FDType = Tuple{
NamedTuple{NG,<:Tuple{Vararg{GType}}} where NG,
NamedTuple{NF,<:Tuple{Vararg{FType}}} where NF,
}
- a way to dispatch on type in
getfield
(to fix issue #246) which brings 5% performance gain to the latter
The total performance gain with with the addition of avoiding most of the calls to ogr_f_getfieldindex
is now 10% compared to 1.
@benchmark Tables.columntable(AG.getlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 142 samples with 1 evaluation.
Range (min … max): 182.404 ms … 335.852 ms ┊ GC (min … max): 0.00% … 10.29%
Time (median): 196.571 ms ┊ GC (median): 0.00%
Time (mean ± σ): 213.022 ms ± 37.221 ms ┊ GC (mean ± σ): 3.27% ± 4.88%
▃█▅▄▂▄ ▄
▆██████▇██▇▃▃▁▃▁▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▁▁▃▃▅▄▇▁▃▄▆▃▅▃▁▁▁▁▁▁▁▁▁▃ ▃
182 ms Histogram: frequency by time 318 ms <
Memory estimate: 8.98 MiB, allocs estimate: 373869.
julia> @benchmark Tables.columntable(Shapefile.Table("../../tmp/road/road.shp")) seconds = 30BenchmarkTools.Trial: 198 samples with 1 evaluation.
Range (min … max): 142.824 ms … 163.894 ms ┊ GC (min … max): 0.00% … 5.48%
Time (median): 152.970 ms ┊ GC (median): 4.10%
Time (mean ± σ): 152.064 ms ± 4.070 ms ┊ GC (mean ± σ): 2.95% ± 2.46%
▃▂▂▄ ▄ ▇▂█▂▄▄▅
▃▁▁▃▁▃▃▅▆▅▆▆▅█▇█████▅██▃▃▅▃▃▅▁█▆██▆▇████████▆▃▆▇▇▆▅▅▁▁▁▃▃▁▁▁▃ ▃
143 ms Histogram: frequency by time 161 ms <
Memory estimate: 35.89 MiB, allocs estimate: 243040.
julia> @benchmark Tables.columntable(AG.getlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 144 samples with 1 evaluation.
Range (min … max): 178.243 ms … 312.567 ms ┊ GC (min … max): 0.00% … 13.64%
Time (median): 191.755 ms ┊ GC (median): 0.00%
Time (mean ± σ): 208.443 ms ± 40.801 ms ┊ GC (mean ± σ): 3.39% ± 5.23%
█▂▁▆▅▂▂
▃████████▅▄▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▁▁▃▃▃▄▃▅▁▃▄▃▃▃▃ ▃
178 ms Histogram: frequency by time 312 ms <
Memory estimate: 8.98 MiB, allocs estimate: 373879.
julia> @benchmark Tables.columntable(AG.getFDPlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30
BenchmarkTools.Trial: 169 samples with 1 evaluation.
Range (min … max): 158.547 ms … 245.862 ms ┊ GC (min … max): 0.00% … 7.86%
Time (median): 169.471 ms ┊ GC (median): 0.00%
Time (mean ± σ): 178.152 ms ± 22.915 ms ┊ GC (mean ± σ): 1.48% ± 2.67%
▁▃▃▆▄▇█▁▃▂▆▁
▇████████████▇▄▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃▅▇▃▃▆▄▅▃▃▁▃▁▁▁▃ ▃
159 ms Histogram: frequency by time 243 ms <
Memory estimate: 7.75 MiB, allocs estimate: 279394.
Beyond the call to ogr_l_getnextfeature
, the last step taking some time is the IGeometry
constructor
@yeesian I'm not able to push the commit associated to my previous comment into branch Enhance_ArchGDAL_types. I got the following message:
remote: error: GH006: Protected branch update failed for refs/heads/Enhance_ArchGDAL_types.
remote: error: At least 1 approving review is required by reviewers with write access.
To https://github.com/yeesian/ArchGDAL.jl.git
! [remote rejected] Enhance_ArchGDAL_types -> Enhance_ArchGDAL_types (protected branch hook declined)
error: failed to push some refs to 'https://github.com/yeesian/ArchGDAL.jl.git'
Do you know what I did wrong ?
I've updated the branch protection settings now, can you try again? Thanks!
It stills fails, but I have a different message:
remote: error: GH006: Protected branch update failed for refs/heads/Enhance_ArchGDAL_types.
remote: error: Changes must be made through a pull request.
To https://github.com/yeesian/ArchGDAL.jl.git
! [remote rejected] Enhance_ArchGDAL_types -> Enhance_ArchGDAL_types (protected branch hook declined)
error: failed to push some refs to 'https://github.com/yeesian/ArchGDAL.jl.git'
Shall I try a force push with lease ?
It stills fails, but I have a different message
I have further relaxed the branch protection rules -- I'm so sorry for the inconvenience so far -- can you try yet again? I'll get to your PRs soon; have been on the road for the past few days.
Great, it works. I was able to push commit a146c79 above
Continuing to shave off time in layer to table conversion, I noticed that _infergeomtype
could be optimized by modifying convert functions between GDAL CEnum.Cenum types to AG Enum types.
function convert(::Type{OGRwkbGeometryType}, gogtinst::GDAL.OGRwkbGeometryType)
return OGRwkbGeometryType(Integer(gogtinst))
end
function convert(::Type{GDAL.OGRwkbGeometryType}, ogtinst::OGRwkbGeometryType)
return GDAL.OGRwkbGeometryType(Integer(ogtinst))
end
The convert functions code looks simpler and faster like this, and could be extended to other similar conversions between ArchGDAL Enum and GDAL CEenum.Cenum
This has the prerequisite to align OGRwkbGeometryType
Enum instances' assigned values to those of GDAL.OGRwkbGeometryType
. It should not have any side effect.
julia> geom = AG.fromWKT("POINT (1 2)")
Geometry: POINT (1 2)
julia> @benchmark AG._infergeomtype($(geom.ptr)) BenchmarkTools.Trial: 10000 samples with 949 evaluations. Range (min … max): 83.511 ns … 209.497 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 97.937 ns ┊ GC (median): 0.00% Time (mean ± σ): 95.838 ns ± 13.089 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▆ ▂▂ ▄█▂▁▅ ▂▂ ▁▁▁▃▁▁ ▁ ▂ ▂ █▂█▄██▆▆▅▄█▆██████████▇███████▇▇▇█▇▇▇▇▇██▆▆▇▇█▆▆▆▇▆▅▆▆▄▅▅▆▅▆ █ 83.5 ns Histogram: log(frequency) by time 143 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
- With new convert functions:
```julia
julia> geom = AG.fromWKT("POINT (1 2)")
Geometry: POINT (1 2)
julia> @benchmark AG._infergeomtype($(geom.ptr))
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
Range (min … max): 18.790 ns … 78.432 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 19.063 ns ┊ GC (median): 0.00%
Time (mean ± σ): 19.519 ns ± 2.273 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▇█▄ ▅ ▁ ▂ ▂ ▁ ▂
███▅██▄█▃▅▄▃▄▁▁▄▃▁▃▁▃▁▁▁▁▁▃▁▁▁█▆█▁▁▁▁▁▁▁▁▁▃▁▁▆█▅█▆▆▇▆▅▅▆▆▆▇ █
18.8 ns Histogram: log(frequency) by time 29.9 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
I will go saving time with a trial to implement a specialized version of Tables.eachcolumns
to strip some time from calls to ArchGDAL.getfield
Next
I will go saving time with a trial to implement a specialized version of
Tables.eachcolumns
to strip some time from calls toArchGDAL.getfield
I have copied and specialized Tables.jl/src/fallback.jl and Tables.jl/src/utils.jl. I had to use GeneralizedGenerated.jl to solve a world age problem with generated functions And now we are closing in on Shapefile.jl performance for layer to table conversion
julia> display(@benchmark DataFrame(AG.getFDPlayer(AG.read("../../tmp/road/road.shp"), 0)) seconds = 30)
display(@benchmark DataFrame(Shapefile.Table("../../tmp/road/road.shp")) seconds = 30)
BenchmarkTools.Trial: 181 samples with 1 evaluation.
Range (min … max): 146.779 ms … 251.964 ms ┊ GC (min … max): 0.00% … 6.67%
Time (median): 157.170 ms ┊ GC (median): 0.00%
Time (mean ± σ): 165.837 ms ± 26.339 ms ┊ GC (mean ± σ): 0.99% ± 2.04%
▂▇▅█▅▁▁ ▆▁▁
▅███████▇███▅▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▁▃▁▄▁▄▃▃▃▃ ▃
147 ms Histogram: frequency by time 251 ms <
Memory estimate: 6.32 MiB, allocs estimate: 134898.
BenchmarkTools.Trial: 194 samples with 1 evaluation.
Range (min … max): 146.609 ms … 164.070 ms ┊ GC (min … max): 0.00% … 7.49%
Time (median): 152.686 ms ┊ GC (median): 0.00%
Time (mean ± σ): 154.683 ms ± 5.141 ms ┊ GC (mean ± σ): 2.93% ± 3.05%
▂ ▄▃▂▅▃█▄▂▅ ▄▂ ▇ ▃
▃▁▅▁█▆▁█████████▇▆▅▆▃▇▅▁▁▁▁▁▁▁▅▁▁▁▁▃▃▅▅▆██████▇▇█▆█▆▃█▅▅▃▃▃▁▃ ▃
147 ms Histogram: frequency by time 164 ms <
Memory estimate: 36.71 MiB, allocs estimate: 243065.
@visr you mentioned in comment https://github.com/yeesian/ArchGDAL.jl/pull/238#issuecomment-928958407 that you had a case where layer to table conversion took a longtime
@mathieu17g we can discuss copies and such in a different issue. I still wanted to create one, because I saw that converting a layer to DataFrame for a large vector file took minutes.
Could you hand me the vector file example, to test it with this PR ?
I believe in that case I was reading the HydroATLAS / BasinATLAS data. This is a direct link to the download: https://figshare.com/articles/dataset/HydroATLAS_version_1_0/9890531?file=20087237
One nice property of this dataset is that is has detail levels 1 to 12, which gradually get larger. Especially the higher levels started to get really slow I noticed.
@visr Ok thanks. I will test the PR with this file set
@visr I have reimplemented Tables.columns
to avoid the long compilation time for cases like with HydroATLAS data
1) Faster than with Shapefile uncompiled or compiled, starting from level 6 2) Nevermore than x2 slower uncompiled compared to Shapefile. The worst case being level 1 : 4s vs 2s on my mac (8 years old), which is still reasonable compared to current ArchGDAL master which takes around 150s for level 1+
Uncompiled Shapefile: 7.719764 seconds (10.36 M allocations: 792.434 MiB, 6.68% gc time)
Compiled Shapefile: BenchmarkTools.Trial: 50 samples with 1 evaluation.
Range (min … max): 1.790 s … 2.263 s ┊ GC (min … max): 1.03% … 10.02%
Time (median): 1.979 s ┊ GC (median): 0.92%
Time (mean ± σ): 1.999 s ± 94.965 ms ┊ GC (mean ± σ): 1.11% ± 1.29%
▁▃▁█ ▁ ▃ ▁
▇▁▁▄▁▁▄▁▁▄▄▄▄▁▁▁▄▄▁████▄▄▁▄▄▄▁▁▄▄█▇█▄█▄▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
1.79 s Histogram: frequency by time 2.26 s <
Memory estimate: 501.84 MiB, allocs estimate: 5446511.
Uncompiled AG 0.7.4 : 155.592979 seconds (69.32 M allocations: 3.783 GiB, 0.67% gc time, 93.75% compilation time)
Compiled AG 0.7.4 : BenchmarkTools.Trial: 5 samples with 1 evaluation.
Range (min … max): 9.593 s … 9.887 s ┊ GC (min … max): 2.33% … 2.87%
Time (median): 9.750 s ┊ GC (median): 2.73%
Time (mean ± σ): 9.739 s ± 124.606 ms ┊ GC (mean ± σ): 2.68% ± 0.22%
█ █ █ █ █
█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁█ ▁
9.59 s Histogram: frequency by time 9.89 s <
Memory estimate: 443.71 MiB, allocs estimate: 10380858.
Uncompiled AG normal: 3.652381 seconds (11.98 M allocations: 291.840 MiB, 1.87% gc time, 38.22% compilation time)
Compiled AG normal: BenchmarkTools.Trial: 50 samples with 1 evaluation.
Range (min … max): 2.418 s … 7.823 s ┊ GC (min … max): 7.82% … 6.07%
Time (median): 2.496 s ┊ GC (median): 7.93%
Time (mean ± σ): 2.690 s ± 812.183 ms ┊ GC (mean ± σ): 7.66% ± 0.72%
█▄▄
███▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
2.42 s Histogram: log(frequency) by time 7.82 s <
Memory estimate: 179.96 MiB, allocs estimate: 10161324.
@generated
read functionUncompiled AG FDP pg: 5.266566 seconds (11.17 M allocations: 492.128 MiB, 6.47% gc time, 3.22% compilation time)
Compiled AG FDP pg: BenchmarkTools.Trial: 50 samples with 1 evaluation.
Range (min … max): 1.458 s … 3.087 s ┊ GC (min … max): 1.19% … 0.74%
Time (median): 1.634 s ┊ GC (median): 1.31%
Time (mean ± σ): 1.694 s ± 244.873 ms ┊ GC (mean ± σ): 1.28% ± 0.13%
█▄ ▂
▅▁▁▇███▅█▆▆▅▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
1.46 s Histogram: frequency by time 3.09 s <
Memory estimate: 148.99 MiB, allocs estimate: 5379084.
I have kept all the Table.columns reimplementation trials in the code of the commit below for archive in case we have to come back to it later
I have dropped the use of immutable Tuple
s and the line by line column types widening in favor the use of mutable Vector
s with wide types and post parsing for type narrowing as sketched for the schema build in issue #223
For the feature reading, I have tried to reimplement eachcolumns from Tables.jl/src/utils.jl with:
But using plain generated function and get column turned out to be faster both uncompiled and compiled. It is counter intuitive, and maybe my use of GeneralizedGenerated.jl and RuntimeGeneratedFunction.jl could be enhanced (reason why I left the code in the commit)
Shapefile 1st run: 173.921930 seconds (347.88 M allocations: 20.671 GiB, 11.56% gc time)
Shapefile 2nd run: 190.139144 seconds (342.95 M allocations: 20.386 GiB, 17.76% gc time)
AG FDP pg 1st run: 98.029999 seconds (353.07 M allocations: 9.472 GiB, 8.89% gc time, 0.19% compilation time)
AG FDP pg 2nd run: 93.194404 seconds (348.00 M allocations: 9.178 GiB, 8.46% gc time, 0.02% compilation time)
Uncompiled Shapefile: 2.908215 seconds (5.19 M allocations: 328.995 MiB, 11.32% gc time)
Compiled Shapefile: BenchmarkTools.Trial: 193 samples with 1 evaluation.
Range (min … max): 146.924 ms … 183.801 ms ┊ GC (min … max): 0.00% … 5.49%
Time (median): 156.321 ms ┊ GC (median): 4.59%
Time (mean ± σ): 155.529 ms ± 5.481 ms ┊ GC (mean ± σ): 2.91% ± 2.76%
▄ ▅▂▄▅▂▂ ▂ ▄▇ █▄▇
▅▃▅▅▅██████████▆▆▅▃▅▁▃▃▆▁▅▅▆▃██▆▅██▅███▅▆██▅▆▅▅▅▆▃▁▃▃▅▁▁█▁▃▃▃ ▃
147 ms Histogram: frequency by time 167 ms <
Memory estimate: 36.71 MiB, allocs estimate: 243067.
Uncompiled AG FDP pg: 2.505719 seconds (5.13 M allocations: 294.994 MiB, 3.47% gc time, 7.39% compilation time)
Compiled AG FDP pg: BenchmarkTools.Trial: 161 samples with 1 evaluation.
Range (min … max): 166.438 ms … 263.748 ms ┊ GC (min … max): 0.00% … 7.81%
Time (median): 178.569 ms ┊ GC (median): 0.00%
Time (mean ± σ): 187.196 ms ± 25.490 ms ┊ GC (mean ± σ): 1.39% ± 2.57%
█▃▅▂▇ ▄▃ ▂
▆██████████▆█▆▄▄▃▃▁▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▄▃▃▄▁▅▃▁▁▃▃▃▁▄▃▃▁▃▁▅ ▃
166 ms Histogram: frequency by time 260 ms <
Memory estimate: 8.17 MiB, allocs estimate: 292907.
Great that you were able to improve the performance so much! I don't really have experience with generated / RuntimeGeneratedFunctions / GeneralizedGenerated, so it's hard for me to comment on it, other than that it's probably nice if we don't have to add a dependency. Do any of the limitations of generated functions from Base cause issues, like the world-age issues or closures? What was the reason to look into those packages?
Do any of the limitations of generated functions from Base cause issues, like the world-age issues or closures? What was the reason to look into those packages?
I had the world age issue when trying to call directly asXXX
functions.
But the overhead of calling getindex
on a vector of asXXX
functions at runtime appears to be mitigated somehow by the use of Base.@generated
compared to using a FeatureDefn tailored vector function (field1, ..., fieldn) -> (astypeof field1(field1), ..., astypeoffieldn(fieldn)) built with @RuntimeGeneratedFunction
.
So the use of Base.@generated
functions is faster and simpler. I think it’s good enough.
I’ll make a PR on master for the new Tables.columns
for AbstractFeatureLayer
Repairing Tables methods unit testing, I had to revert from stealgeom
to getgeom
.
The impact on performance is limited for HydroATLAS basin test files and French main roads test file (see tests results below). FDP layer to table conversion is on par or faster (for big files) compared to Shapefile.jl
stealgeom
could still be interesting:
OGR_F_StealGeometry(int iGeomField)
besides OGR_F_StealGeometry()
if we add a Table
object to ArchGDAL (as in Shapefile.jl), built directly on a file name and layer number, to avoid leaving a layer with missing
(stolen) geometries after a call to Tables.columns
or Tables.rows
Shapefile 1st run: 177.759011 seconds (347.88 M allocations: 20.670 GiB, 12.77% gc time)
Shapefile 2nd run: 186.639426 seconds (342.95 M allocations: 20.386 GiB, 17.35% gc time)
AG FDP pg 1st run: 103.441506 seconds (353.07 M allocations: 9.472 GiB, 9.15% gc time, 0.20% compilation time)
AG FDP pg 2nd run: 97.793840 seconds (348.00 M allocations: 9.178 GiB, 8.61% gc time, 0.01% compilation time)
Shapefile 1st run: 4.370634 seconds (10.36 M allocations: 792.293 MiB, 11.53% gc time)
Compiled Shapefile: BenchmarkTools.Trial: 16 samples with 1 evaluation.
Range (min … max): 1.782 s … 2.311 s ┊ GC (min … max): 0.00% … 14.18%
Time (median): 1.922 s ┊ GC (median): 4.01%
Time (mean ± σ): 1.944 s ± 113.983 ms ┊ GC (mean ± σ): 5.11% ± 3.67%
▃ █
▇▁▁▁▁▁▁▁▇▁▇▇▁█▇█▇▁▇▇▁▁▇▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
1.78 s Histogram: frequency by time 2.31 s <
Memory estimate: 501.84 MiB, allocs estimate: 5446511.
AG FDP pg 1st run: 4.993936 seconds (10.46 M allocations: 450.400 MiB, 2.49% gc time, 3.47% compilation time) Compiled AG FDP pg: BenchmarkTools.Trial: 17 samples with 1 evaluation. Range (min … max): 1.553 s … 1.979 s ┊ GC (min … max): 1.75% … 1.98% Time (median): 1.839 s ┊ GC (median): 1.91% Time (mean ± σ): 1.813 s ± 132.176 ms ┊ GC (mean ± σ): 1.95% ± 0.71%
█ ▁ ▁ ▁ █▁▁ █ ▁ ▁ ▁ ▁ █
█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁███▁█▁▁▁█▁█▁▁█▁█▁▁▁▁▁█ ▁
1.55 s Histogram: frequency by time 1.98 s <
Memory estimate: 148.99 MiB, allocs estimate: 5379083.
#### French main roads file
```julia
Shapefile 1st run: 2.446287 seconds (5.19 M allocations: 329.013 MiB, 5.75% gc time)
Compiled Shapefile: BenchmarkTools.Trial: 192 samples with 1 evaluation.
Range (min … max): 147.897 ms … 170.355 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 157.005 ms ┊ GC (median): 4.41%
Time (mean ± σ): 156.399 ms ± 4.845 ms ┊ GC (mean ± σ): 2.94% ± 2.80%
▂ ▁ ▂ ▁▇█▁ ▂ ▁ ▁ ▂ ▄▄▂▁▂ ▅
▃▃▃▃▃▃█▅█▆█████████▃███▁▅▁▁▅▁▁▃▁▅▃█▅█▆▅▃█▆██████▃██▁█▆▃▅▅▅▆▁▅ ▃
148 ms Histogram: frequency by time 165 ms <
Memory estimate: 36.71 MiB, allocs estimate: 243067.
AG FDP pg 1st run: 2.479176 seconds (5.13 M allocations: 295.079 MiB, 4.28% gc time, 8.29% compilation time)
Compiled AG FDP pg: BenchmarkTools.Trial: 153 samples with 1 evaluation.
Range (min … max): 176.988 ms … 274.031 ms ┊ GC (min … max): 0.00% … 7.41%
Time (median): 187.198 ms ┊ GC (median): 0.00%
Time (mean ± σ): 196.996 ms ± 24.662 ms ┊ GC (mean ± σ): 1.46% ± 2.73%
▁█▇ ▂ ▁
▃▄█████▇▇█▄▅▅▇▃▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▃▃▃▃▃▃▃▃▃▃▃▁▃▃ ▃
177 ms Histogram: frequency by time 264 ms <
Memory estimate: 8.17 MiB, allocs estimate: 292898.
I'm really sorry, but I'm going through a period of bereavement due to a personal loss, and nominate @visr to review this PR in my absence. If he might be unavailable to do so, I'll review it when I'm in the appropriate state to do so.
(At a high level, performance matters to me: I'm partial to both this PR and https://github.com/yeesian/ArchGDAL.jl/pull/243 and will allow for them to introduce breaking/experimental changes since it's still pre-1.0 and might give some directional information/guidance on workflows and APIs that enable performance gains. I'd review with an eye towards complexity and maintainability, but my preferences are ultimately non-blocking and I'm willing to postpone them until after @mathieu17g has explored the space of generated functions and unexploited GDAL functions that are important for optimization and might have implications for the API.)
Really sorry to hear that @yeesian. Of course take all the time you need, I wish you strength during a difficult time.
I can probably help out a bit here with reviewing. As it is right now I don't fully understand the PR, but that also has to do with my limited experience with generated functions and since I didn't properly dive into the code yet. @mathieu17g hopefully when you come out of the (impressive) rabbit hole here the result will be sufficiently simple.
@yeesian, I'm so sorry for your loss. Thank you for having taken the time to share your views with us in those difficult times.
As of this PR and @generated
functions, they are quite adapted to the GDAL transposition to Julia for fields manipulation. They fill the performance gap with what a C compiler would optimize on field manipulations when doing something like building a dataframe.
It remains to be seen, if there is any gain outside this use case. If not we could end up confining the new parameterized types and the associated generated functions to the use case of direct conversion from geofiles to table source.
@visr I have:
Table
object and Tables.jl interface methods in a single file (src/tables2.jl)Could you please have a look to it, and test its performance on your own. BenchmarkCI judgement gives results a bit different from the ones I get locally, without any order change nonetheless. Below a copy of commit 2ca3a4c results:
ID time GC time memory allocations
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––––––– ––––––– –––––––––––––– –––––––––––
["shapefile_to_table", "frenchroads_ArchGDAL"] 285.500 ms (5%) 8.43 MiB (1%) 373899
["shapefile_to_table", "frenchroads_ArchGDAL_new_Tables_interface"] 238.089 ms (5%) 7.19 MiB (1%) 279339
["shapefile_to_table", "frenchroads_ArchGDAL_new_Tables_interface_vsizip"] 366.701 ms (5%) 7.20 MiB (1%) 279381
["shapefile_to_table", "frenchroads_ArchGDAL_vsizip"] 412.876 ms (5%) 8.43 MiB (1%) 373941
["shapefile_to_table", "frenchroads_Shapefile"] 187.883 ms (5%) 35.39 MiB (1%) 243030
@mathieu17g I'm a little bit lost. It would be of great help to me if you could write a high level summary of (1) the issue that this PR is trying to fix, and (2) how the proposed changes help fix this issue. I have only a rough understanding about ArchGDAL Tables performance issues. If there is a simple change that solves the largest performance bottleneck, but leaves another 10% on the table, that may well be preferred.
I see you introduce new parametrized structs besides the original unparametrized ones. Is that for now to isolate the changes, or is there a reason we'd have to keep both around?
@visr sure, I will make a summary
@visr this PR tries to solve two issues and introduces some more limited enhancements:
getfield
on types: x2.5 speedup but complex parametric types are introducedOGRwkbGeometryType
conversions between GDAL and ArchGDAL, used in ArchGDAL Geometry
and IGeometry
constructors: x30 speedupThe parametric types introduced to solve the 1st issue can be:
Table
structSince getfield
is mainly used in the ArchGDAL table interface where we read and convert all fields of all features of a feature layer, the performance enhancement is mainly felt on the table interface. So option 2 allows to limit the ArchGDAL overall code complexity. The replacement of ArchGDAL OGR related types could be considered later should a new benefit arise. But if both @visr and @yeesian are confortable with this parametric types, we can go for a replacement now.
Follow-ups besides the parametric types handling:
OGR_F_StealGeometryEx( OGRFeatureH, int iGeomField )
(see issue), and update stealgeom(::FDP_AbstractFeature{FD}, ::Integer)
getfield
dispatch on field typesgetfield
is called on Feature
. Feature
type does not expose any information on the field types. I added this information to the Feature
type through a feature defintion like parameter to the Feature
type.
This gives:
FDP_xxx
types, "FDP" standing for Feature Definition Parameterized. See 1st comment §1.4.2 above for detailsgetfield
function, which brings a x2.5 speedup
@generated function getfield(
fdp_feature::FDP_AbstractFeature{FD},
i::Integer,
) where {FD<:FDType}
return quote
return if !isfieldset(fdp_feature, i)
nothing
elseif isfieldnull(fdp_feature, i)
missing
else
$(_get_fields_asfuncs(FD))[i+1](fdp_feature, i)
end
end
end
julia> layer = AG.getlayer(AG.read("benchmark/data/road/road.shp")) # You may have to run BenchmarkCI.judge() locally, to get the datafile available in ArchGDAL/benchmark/
feature = iterate(layer)[1]
@btime AG.getfield($feature, 0); nothing
181.913 ns (1 allocation: 16 bytes)
julia> fdp_layer = AG.getFDPlayer(AG.read("benchmark/data/road/road.shp"))
fdp_feature = iterate(fdp_layer)[1]
@btime AG.getfield($fdp_feature, 0); nothing
73.403 ns (0 allocations: 0 bytes)
Note that I tried to further enhance performance by dispatching on Val{i}()
, but to no avail since the time spent on the Val
contructor is roughly the same as the time saved by avoiding the getindex
in getfield
When used in the table interface by a client via Tables.getcolumn
or Tables.eachcolumn
(SQLite.jl) or Tables.eachcolumns
(DataFrames.jl), the speedup remains the more or less the same depending of the cases: x2.5
```julia julia> file = "benchmark/data/road/road.shp" # You may have to run BenchmarkCI.judge() locally, to get the datafile available in ArchGDAL/benchmark/ "benchmark/data/road/road.shp" julia> layer = AG.getlayer(AG.read(file)) feature = iterate(layer)[1] (;names, types) = AG.gdal_schema(layer) cols = [Vector{types[i]}(undef, 1) for i in 1:(AG.ngeom(feature) + AG.nfield(feature))] fill_row = (val::Any, i::Int, nm::Symbol) -> cols[i][1] = val sch = Tables.Schema(names, nothing) @btime Tables.eachcolumn($fill_row, $sch, $feature) 3.274 μs (18 allocations: 368 bytes) julia> fdp_layer = AG.getFDPlayer(AG.read(file)) fdp_feature = iterate(fdp_layer)[1] types = AG.gdal_schema(fdp_layer).types cols = [Vector{types[i]}(undef, 1) for i in 1:(AG.ngeom(fdp_feature) + AG.nfield(fdp_feature))] @btime AG.FDPf2c($fdp_feature, 1, $cols) 1.416 μs (9 allocations: 224 bytes) ```
Note that I tried to spare the time spent in getindex
in Tables.eachcolumn
and Tables.eachcolumns
this time via runtime generated function (RuntimeGeneratedFunction.jl and GeneralizedGenerated.jl see src/tables_columns.jl), but to no avail. These runtime generated functions have an overhead bigger than the call to getindex
when looping getfield
on all the fields of a feature.
THIS SHOULD PROBABLY BE FURTHER INVESTIGATED
Conclusion:
@visr you raised the issue that with ArchGDAL ≤ v0.8.0, which uses Tables.jl fallback function Tables.columns
with nothing
as Schema
, the reading of files with many fields was taking "minutes".
Your case takes on my mac: ~150s at the first run vs ~3s at the second
```julia julia> using Pkg; Pkg.precompile(); Pkg.status("ArchGDAL") import ArchGDAL as AG using DataFrames; using BenchmarkTools; using Printf level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp" @time DataFrame(AG.getlayer(AG.read(file), 0)) Status `~/.../Project.toml` [c9ce4bd3] ArchGDAL v0.8.0 `dev/ArchGDAL` 156.487261 seconds (55.14 M allocations: 3.081 GiB, 0.52% gc time, 99.42% compilation time) 10×295 DataFrame Row │ HYBAS_ID NEXT_DOWN NEXT_SINK MAIN_BAS DIST_SINK DIST_MAIN SUB_AREA UP_AREA PFAF_ID ⋯ │ IGeometr… Int64 Int64 Int64 Int64 Float64 Float64 Float64 Float64 Int64 ⋯ ─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ Geometry: wkbMultiPolygon 1010000010 0 1010000010 1010000010 0.0 0.0 2.99531e7 2.99531e7 1 ⋯ 2 │ Geometry: wkbMultiPolygon 2010000010 0 2010000010 2010000010 0.0 0.0 1.78589e7 1.78589e7 2 3 │ Geometry: wkbMultiPolygon 3010000010 0 3010000010 3010000010 0.0 0.0 1.29491e7 1.29491e7 3 4 │ Geometry: wkbMultiPolygon 4010000010 0 4010000010 4010000010 0.0 0.0 2.08274e7 2.08274e7 4 5 │ Geometry: wkbMultiPolygon 5010000010 0 5010000010 5010000010 0.0 0.0 1.08038e7 1.08038e7 5 ⋯ 6 │ Geometry: wkbMultiPolygon 6010000010 0 6010000010 6010000010 0.0 0.0 1.78535e7 1.78535e7 6 7 │ Geometry: wkbMultiPolygon 7010000010 0 7010000010 7010000010 0.0 0.0 1.59146e7 1.59146e7 7 8 │ Geometry: wkbMultiPolygon 8010000010 0 8010000010 8010000010 0.0 0.0 6.19709e6 6.19709e6 8 9 │ Geometry: wkbMultiPolygon 8010020760 0 8010020760 8010020760 0.0 0.0 1.17002e5 1.17002e5 3 ⋯ 10 │ Geometry: wkbMultiPolygon 9010000010 0 9010000010 9010000010 0.0 0.0 2.14672e6 2.14672e6 9 285 columns omitted ```
Raising the SPECIALIZATION_THRESHOLD
and related values in Tables.jl/src/utils.jl (in @nexprs
) above the number of fields (400 tried for the basin shapefile level 1 which has around 300 fields), does not bring any performance enhancement
```julia julia> using Pkg; Pkg.update(); Pkg.status("ArchGDAL") import ArchGDAL as AG using DataFrames; using BenchmarkTools; using Printf level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp" @time DataFrame(AG.getlayer(AG.read(file), 0)) Updating registry at `~/.julia/registries/General` Updating git-repo `https://github.com/JuliaRegistries/General.git` No Changes to `~/.../Project.toml` No Changes to `~/.../Manifest.toml` Precompiling project... 18 dependencies successfully precompiled in 49 seconds (242 already precompiled) Status `~/.../Project.toml` [c9ce4bd3] ArchGDAL v0.7.4 `dev/ArchGDAL` 159.786486 seconds (61.80 M allocations: 3.383 GiB, 0.56% gc time, 99.45% compilation time) 10×295 DataFrame Row │ HYBAS_ID NEXT_DOWN NEXT_SINK MAIN_BAS DIST_SINK DIST_MAIN SUB_AREA UP_AREA P ⋯ │ IGeometr… Int64 Int64 Int64 Int64 Float64 Float64 Float64 Float64 I ⋯ ─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ Geometry: wkbMultiPolygon 1010000010 0 1010000010 1010000010 0.0 0.0 2.99531e7 2.99531e7 ⋯ 2 │ Geometry: wkbMultiPolygon 2010000010 0 2010000010 2010000010 0.0 0.0 1.78589e7 1.78589e7 3 │ Geometry: wkbMultiPolygon 3010000010 0 3010000010 3010000010 0.0 0.0 1.29491e7 1.29491e7 4 │ Geometry: wkbMultiPolygon 4010000010 0 4010000010 4010000010 0.0 0.0 2.08274e7 2.08274e7 5 │ Geometry: wkbMultiPolygon 5010000010 0 5010000010 5010000010 0.0 0.0 1.08038e7 1.08038e7 ⋯ 6 │ Geometry: wkbMultiPolygon 6010000010 0 6010000010 6010000010 0.0 0.0 1.78535e7 1.78535e7 7 │ Geometry: wkbMultiPolygon 7010000010 0 7010000010 7010000010 0.0 0.0 1.59146e7 1.59146e7 8 │ Geometry: wkbMultiPolygon 8010000010 0 8010000010 8010000010 0.0 0.0 6.19709e6 6.19709e6 9 │ Geometry: wkbMultiPolygon 8010020760 0 8010020760 8010020760 0.0 0.0 1.17002e5 1.17002e5 ⋯ 10 │ Geometry: wkbMultiPolygon 9010000010 0 9010000010 9010000010 0.0 0.0 2.14672e6 2.14672e6 286 columns omitted ```
The performance bottleneck at first run is located in Tables.add_or_widen!
(seen when profiling). Probably around the columm allocation machinery based on Tuple
PROBABLY NEEDS AN ISSUE TO BE RAISED IN TABLES.JL
If we force the usage of Tables.add!
instead of Tables.add_or_widen!
, by giving the Tables.Schema
to Tables.buildcolumns
in Tables.columns
, the performance issue disappear (see below): 3.22s intead of ~150s on my mac
```julia julia> using Pkg; Pkg.update(); Pkg.status("ArchGDAL") import ArchGDAL as AG using DataFrames; using BenchmarkTools; using Printf; using Tables level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp" sch = Tables.Schema((Symbol(""), :HYBAS_ID, :NEXT_DOWN, :NEXT_SINK, :MAIN_BAS, :DIST_SINK, :DIST_MAIN, :SUB_AREA, :UP_AREA, :PFAF_ID, :ENDO, :COAST, :ORDER_, :SORT, :dis_m3_pyr, :dis_m3_pmn, :dis_m3_pmx, :run_mm_syr, :inu_pc_smn, :inu_pc_umn, :inu_pc_smx, :inu_pc_umx, :inu_pc_slt, :inu_pc_ult, :lka_pc_sse, :lka_pc_use, :lkv_mc_usu, :rev_mc_usu, :dor_pc_pva, :ria_ha_ssu, :ria_ha_usu, :riv_tc_ssu, :riv_tc_usu, :gwt_cm_sav, :ele_mt_sav, :ele_mt_uav, :ele_mt_smn, :ele_mt_smx, :slp_dg_sav, :slp_dg_uav, :sgr_dk_sav, :clz_cl_smj, :cls_cl_smj, :tmp_dc_syr, :tmp_dc_uyr, :tmp_dc_smn, :tmp_dc_smx, :tmp_dc_s01, :tmp_dc_s02, :tmp_dc_s03, :tmp_dc_s04, :tmp_dc_s05, :tmp_dc_s06, :tmp_dc_s07, :tmp_dc_s08, :tmp_dc_s09, :tmp_dc_s10, :tmp_dc_s11, :tmp_dc_s12, :pre_mm_syr, :pre_mm_uyr, :pre_mm_s01, :pre_mm_s02, :pre_mm_s03, :pre_mm_s04, :pre_mm_s05, :pre_mm_s06, :pre_mm_s07, :pre_mm_s08, :pre_mm_s09, :pre_mm_s10, :pre_mm_s11, :pre_mm_s12, :pet_mm_syr, :pet_mm_uyr, :pet_mm_s01, :pet_mm_s02, :pet_mm_s03, :pet_mm_s04, :pet_mm_s05, :pet_mm_s06, :pet_mm_s07, :pet_mm_s08, :pet_mm_s09, :pet_mm_s10, :pet_mm_s11, :pet_mm_s12, :aet_mm_syr, :aet_mm_uyr, :aet_mm_s01, :aet_mm_s02, :aet_mm_s03, :aet_mm_s04, :aet_mm_s05, :aet_mm_s06, :aet_mm_s07, :aet_mm_s08, :aet_mm_s09, :aet_mm_s10, :aet_mm_s11, :aet_mm_s12, :ari_ix_sav, :ari_ix_uav, :cmi_ix_syr, :cmi_ix_uyr, :cmi_ix_s01, :cmi_ix_s02, :cmi_ix_s03, :cmi_ix_s04, :cmi_ix_s05, :cmi_ix_s06, :cmi_ix_s07, :cmi_ix_s08, :cmi_ix_s09, :cmi_ix_s10, :cmi_ix_s11, :cmi_ix_s12, :snw_pc_syr, :snw_pc_uyr, :snw_pc_smx, :snw_pc_s01, :snw_pc_s02, :snw_pc_s03, :snw_pc_s04, :snw_pc_s05, :snw_pc_s06, :snw_pc_s07, :snw_pc_s08, :snw_pc_s09, :snw_pc_s10, :snw_pc_s11, :snw_pc_s12, :glc_cl_smj, :glc_pc_s01, :glc_pc_s02, :glc_pc_s03, :glc_pc_s04, :glc_pc_s05, :glc_pc_s06, :glc_pc_s07, :glc_pc_s08, :glc_pc_s09, :glc_pc_s10, :glc_pc_s11, :glc_pc_s12, :glc_pc_s13, :glc_pc_s14, :glc_pc_s15, :glc_pc_s16, :glc_pc_s17, :glc_pc_s18, :glc_pc_s19, :glc_pc_s20, :glc_pc_s21, :glc_pc_s22, :glc_pc_u01, :glc_pc_u02, :glc_pc_u03, :glc_pc_u04, :glc_pc_u05, :glc_pc_u06, :glc_pc_u07, :glc_pc_u08, :glc_pc_u09, :glc_pc_u10, :glc_pc_u11, :glc_pc_u12, :glc_pc_u13, :glc_pc_u14, :glc_pc_u15, :glc_pc_u16, :glc_pc_u17, :glc_pc_u18, :glc_pc_u19, :glc_pc_u20, :glc_pc_u21, :glc_pc_u22, :pnv_cl_smj, :pnv_pc_s01, :pnv_pc_s02, :pnv_pc_s03, :pnv_pc_s04, :pnv_pc_s05, :pnv_pc_s06, :pnv_pc_s07, :pnv_pc_s08, :pnv_pc_s09, :pnv_pc_s10, :pnv_pc_s11, :pnv_pc_s12, :pnv_pc_s13, :pnv_pc_s14, :pnv_pc_s15, :pnv_pc_u01, :pnv_pc_u02, :pnv_pc_u03, :pnv_pc_u04, :pnv_pc_u05, :pnv_pc_u06, :pnv_pc_u07, :pnv_pc_u08, :pnv_pc_u09, :pnv_pc_u10, :pnv_pc_u11, :pnv_pc_u12, :pnv_pc_u13, :pnv_pc_u14, :pnv_pc_u15, :wet_cl_smj, :wet_pc_sg1, :wet_pc_ug1, :wet_pc_sg2, :wet_pc_ug2, :wet_pc_s01, :wet_pc_s02, :wet_pc_s03, :wet_pc_s04, :wet_pc_s05, :wet_pc_s06, :wet_pc_s07, :wet_pc_s08, :wet_pc_s09, :wet_pc_u01, :wet_pc_u02, :wet_pc_u03, :wet_pc_u04, :wet_pc_u05, :wet_pc_u06, :wet_pc_u07, :wet_pc_u08, :wet_pc_u09, :for_pc_sse, :for_pc_use, :crp_pc_sse, :crp_pc_use, :pst_pc_sse, :pst_pc_use, :ire_pc_sse, :ire_pc_use, :gla_pc_sse, :gla_pc_use, :prm_pc_sse, :prm_pc_use, :pac_pc_sse, :pac_pc_use, :tbi_cl_smj, :tec_cl_smj, :fmh_cl_smj, :fec_cl_smj, :cly_pc_sav, :cly_pc_uav, :slt_pc_sav, :slt_pc_uav, :snd_pc_sav, :snd_pc_uav, :soc_th_sav, :soc_th_uav, :swc_pc_syr, :swc_pc_uyr, :swc_pc_s01, :swc_pc_s02, :swc_pc_s03, :swc_pc_s04, :swc_pc_s05, :swc_pc_s06, :swc_pc_s07, :swc_pc_s08, :swc_pc_s09, :swc_pc_s10, :swc_pc_s11, :swc_pc_s12, :lit_cl_smj, :kar_pc_sse, :kar_pc_use, :ero_kh_sav, :ero_kh_uav, :pop_ct_ssu, :pop_ct_usu, :ppd_pk_sav, :ppd_pk_uav, :urb_pc_sse, :urb_pc_use, :nli_ix_sav, :nli_ix_uav, :rdd_mk_sav, :rdd_mk_uav, :hft_ix_s93, :hft_ix_u93, :hft_ix_s09, :hft_ix_u09, :gad_id_smj, :gdp_ud_sav, :gdp_ud_ssu, :gdp_ud_usu, :hdi_ix_sav),(AG.IGeometry{AG.wkbMultiPolygon}, Int64, Int64, Int64, Int64, Float64, Float64, Float64, Float64, Int64, Int32, Int32, Int32, Int32, Float64, Float64, Float64, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Float64, Float64, Float64, Float64, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Float64, Float64, Float64, Float64, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Int64, Int64, Int32)) @time DataFrame(Tables.CopiedColumns(Tables.buildcolumns(sch, AG.getlayer(AG.read(file), 0)))) Updating registry at `~/.julia/registries/General` Updating git-repo `https://github.com/JuliaRegistries/General.git` No Changes to `~/.../Project.toml` No Changes to `~/.../Manifest.toml` Status `~/.../Project.toml` [c9ce4bd3] ArchGDAL v0.7.4 `dev/ArchGDAL` 3.229805 seconds (4.18 M allocations: 216.228 MiB, 13.59% gc time, 75.90% compilation time) 10×295 DataFrame Row │ HYBAS_ID NEXT_DOWN NEXT_SINK MAIN_BAS DIST_SINK DIST_MAIN SUB_AREA ⋯ │ IGeometr… Int64 Int64 Int64 Int64 Float64 Float64 Float64 ⋯ ─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ Geometry: wkbMultiPolygon 1010000010 0 1010000010 1010000010 0.0 0.0 2.99531e7 ⋯ 2 │ Geometry: wkbMultiPolygon 2010000010 0 2010000010 2010000010 0.0 0.0 1.78589e7 3 │ Geometry: wkbMultiPolygon 3010000010 0 3010000010 3010000010 0.0 0.0 1.29491e7 4 │ Geometry: wkbMultiPolygon 4010000010 0 4010000010 4010000010 0.0 0.0 2.08274e7 5 │ Geometry: wkbMultiPolygon 5010000010 0 5010000010 5010000010 0.0 0.0 1.08038e7 ⋯ 6 │ Geometry: wkbMultiPolygon 6010000010 0 6010000010 6010000010 0.0 0.0 1.78535e7 7 │ Geometry: wkbMultiPolygon 7010000010 0 7010000010 7010000010 0.0 0.0 1.59146e7 8 │ Geometry: wkbMultiPolygon 8010000010 0 8010000010 8010000010 0.0 0.0 6.19709e6 9 │ Geometry: wkbMultiPolygon 8010020760 0 8010020760 8010020760 0.0 0.0 1.17002e5 ⋯ 10 │ Geometry: wkbMultiPolygon 9010000010 0 9010000010 9010000010 0.0 0.0 2.14672e6 287 columns omitted ```
Unfortunately, we cannot know the tightest schema for sure from a layer feature definition (see the case of Shapefile driver with LineString and MultiLineString). Therefore we would have to provide a generic looser schema using:
Union{Missing, IGeometry}
for geometry fieldsUnion{Missing, Nothing, <field's DataType>}
for fieldsand tighten it after the layer has been read. But this feature is not provided by Tables.jl. Therefore we have to implement Tables.columns
as mentionned in my fist issue on ArchGDAL :-)
This leads to implementing on ArchGDAL.jl side, at least to handle the missing geometry case, something faster than
buildcolumns(::Nothing, rowitr::T) where {T}
->_buildcolumns
->eachcolumns
/add_or_widen!
->__buildcolumns
. I doubt that it could be done easily.
Tables.columns(::AbstractFeatureLayer)
is implemented in this PR in src/tables.jl. Here is what it does:
Vector{Vector}
instead of Tuple{Tuple}
as in Tables.jl, for the simplicity and for I have not identified any significant performance reductiongetcolumn
NamedTuple
which natively supports the Tables.jl interfaceNamedTuple
prevents extremely wide tables with # of columns > 67K which is due to Julia compiler limitation. Should a need arise for larger tables, Table struct would have to be modified to hold something largerWith this function the uncompiled performance is back to an execution time similar to the one of Tables.jl fallback Tables.columns
with a specified schema: 3.37s on my mac
```julia julia> using Pkg; Pkg.update(); Pkg.status("ArchGDAL") import ArchGDAL as AG using DataFrames; using BenchmarkTools; using Printf level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp" @time DataFrame(AG.getlayer(AG.read(file), 0)) Updating registry at `~/.julia/registries/General` Updating git-repo `https://github.com/JuliaRegistries/General.git` Updating `~/.../Project.toml` [c9ce4bd3] ~ ArchGDAL v0.7.4 `dev/ArchGDAL` ⇒ v0.8.0 `dev/ArchGDAL` Updating `~/.../Manifest.toml` [c9ce4bd3] ~ ArchGDAL v0.7.4 `dev/ArchGDAL` ⇒ v0.8.0 `dev/ArchGDAL` Precompiling project... 1 dependency successfully precompiled in 11 seconds (259 already precompiled) Status `~/.../Project.toml` [c9ce4bd3] ArchGDAL v0.8.0 `dev/ArchGDAL` 3.370118 seconds (5.11 M allocations: 281.298 MiB, 3.18% gc time, 76.13% compilation time) 10×295 DataFrame Row │ HYBAS_ID NEXT_DOWN NEXT_SINK MAIN_BAS DIST_SINK DIST_MAIN SUB_AREA UP ⋯ │ IGeometr… Int64 Int64 Int64 Int64 Float64 Float64 Float64 Fl ⋯ ─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ Geometry: wkbMultiPolygon 1010000010 0 1010000010 1010000010 0.0 0.0 2.99531e7 2. ⋯ 2 │ Geometry: wkbMultiPolygon 2010000010 0 2010000010 2010000010 0.0 0.0 1.78589e7 1. 3 │ Geometry: wkbMultiPolygon 3010000010 0 3010000010 3010000010 0.0 0.0 1.29491e7 1. 4 │ Geometry: wkbMultiPolygon 4010000010 0 4010000010 4010000010 0.0 0.0 2.08274e7 2. 5 │ Geometry: wkbMultiPolygon 5010000010 0 5010000010 5010000010 0.0 0.0 1.08038e7 1. ⋯ 6 │ Geometry: wkbMultiPolygon 6010000010 0 6010000010 6010000010 0.0 0.0 1.78535e7 1. 7 │ Geometry: wkbMultiPolygon 7010000010 0 7010000010 7010000010 0.0 0.0 1.59146e7 1. 8 │ Geometry: wkbMultiPolygon 8010000010 0 8010000010 8010000010 0.0 0.0 6.19709e6 6. 9 │ Geometry: wkbMultiPolygon 8010020760 0 8010020760 8010020760 0.0 0.0 1.17002e5 1. ⋯ 10 │ Geometry: wkbMultiPolygon 9010000010 0 9010000010 9010000010 0.0 0.0 2.14672e6 2. 287 columns omitted ```
Conclusion: Performance issue at first run solved for files of similar to the HydroATLAS basin shapefiles (lots of fields): time is now on par with the standard Tables.jl fallback functions with full Tables.Schema
provided
When the source does not have to be preserved, stealing geometry is faster. The speedup increases with the geometry size. From x1.5 (Point) up to x500 (big MultiPolygon as in HydroATLAS basin shapefile level 1)
```julia julia> import ArchGDAL as AG using GDAL, BenchmarkTools, Printf julia> file = "dev/ArchGDAL/test/data/point.geojson" layer = AG.getlayer(AG.read(file)) @btime AG.IGeometry(GDAL.ogr_g_clone(GDAL.ogr_f_getgeometryref(feature.ptr))) setup=(feature = iterate(AG.copy($layer))[1]) @btime AG.IGeometry(GDAL.ogr_f_stealgeometry(feature.ptr)) setup=(feature = iterate(AG.copy($layer))[1]) nothing 350.620 ns (3 allocations: 48 bytes) 211.821 ns (3 allocations: 48 bytes) julia> level = 1; file = "tmp/HydroATLAS_v10/BasinATLAS_Data_v10_shp/BasinATLAS_v10_shp/BasinATLAS_v10_lev$(@sprintf("%02i", level)).shp" layer = AG.getlayer(AG.read(file)) @btime AG.IGeometry(GDAL.ogr_g_clone(GDAL.ogr_f_getgeometryref(feature.ptr))) setup=(feature = iterate(AG.copy($layer))[1]) @btime AG.IGeometry(GDAL.ogr_f_stealgeometry(feature.ptr)) setup=(feature = iterate(AG.copy($layer))[1]) nothing 676.252 μs (3 allocations: 48 bytes) 1.159 μs (3 allocations: 48 bytes) ```
Since the geometry left when stealing is a NULL geometry, stealgeom
can only used when the source layer is transient (see example in src/tables2.jl on Table
)
Unfortunately, this function is available only for the first geometry on GDAL C interface via OGR_F_StealGeometry()
. An issue has been raised and accepted on GDAL side, waiting for a PR to be proposed
Below a trial for issue #215 being tackled by @yeesian
convert
for GDAL.OGRwkbGeometryType
and ArchGDAL.OGRwkbGeometryType
I have aligned UInt32
values assigned in OGRwkbGeometryType
enums between GDAL and ArchGDAL. This allows to avoid the @convert
macro, and instead to call directly the enum's constructor on values for conversion
It brings a x30 speedup, propagated to _infergeomtype
and the Geometry
and IGeometry
construtors
V0.8.0 `master` branch: ```julia julia> @btime convert(AG.OGRwkbGeometryType, GDAL.wkbMultiPolygon) @btime convert(GDAL.OGRwkbGeometryType, AG.wkbMultiPolygon) nothing 63.165 ns (0 allocations: 0 bytes) 63.103 ns (0 allocations: 0 bytes) ``` `Enhance_ArchGDAL_types` branch: ```julia julia> @btime convert(AG.OGRwkbGeometryType, GDAL.wkbMultiPolygon) @btime convert(GDAL.OGRwkbGeometryType, AG.wkbMultiPolygon) nothing 2.744 ns (0 allocations: 0 bytes) 0.025 ns (0 allocations: 0 bytes) ```
Thanks a lot @mathieu17g for taking the time to write down all your findings! I'll get back to it when I have some more time.
Hi @visr, did you had any time lately to look into this PR ? I would prefer that we decide on this one before finishing PR #243.
In the mean time, I proposed a PR in GDAL to be able to extend the geometry stealing performance gain beyond the the first geometry. It is still under review. Once merged, I will see to propagate it all the way to ArchGDAL.
@mathieu17g thanks for the bump, and sorry I did not yet sit down to get the proper look that this work deserves. I hope to get to this soon.
I see https://github.com/OSGeo/gdal/pull/5196 landed, nice work there!
Regarding this bit of your writeup:
PROBABLY NEEDS AN ISSUE TO BE RAISED IN TABLES.JL
Perhaps it's a good idea to go ahead and raise that issue, with a reproducible example for them to test the performance? I've seen the same functions show up in my earlier profiles, but couldn't really figure out how to improve it.
@visr I have proposed in https://github.com/JuliaData/Tables.jl/issues/273 solution to overcome the performance issue at first run. But the current PR implementation of Tables.columns
will still be faster by a factor of x5 at first run and > x5 for subsequent runs (tested on HydroBASIN level 7)
Awesome, I tried your inlining change from https://github.com/JuliaData/Tables.jl/issues/273#issuecomment-1037054582 as well.
On level 4 BasinATLAS I see similar results, around 100x speedup of compilation, and no real effect on runtime, only some less allocations and less time in gc. Noted this was just a simple repeated at-time exercise.
# level 4, 1st run on columntable: 178.628898 seconds (55.54 M allocations: 3.079 GiB, 0.44% gc time, 98.50% compilation time)
# level 4, 2nd run on columntable: 2.119344 seconds (912.34 k allocations: 222.733 MiB, 3.29% gc time)
# level 4, 3rd run on columntable: 2.293723 seconds (912.34 k allocations: 222.733 MiB, 2.50% gc time)
# level 4, 4th run on columntable: 2.434573 seconds (917.14 k allocations: 223.011 MiB, 6.13% gc time, 0.31% compilation time)
# level 4, 5th run on columntable: 2.177187 seconds (912.34 k allocations: 222.733 MiB, 3.34% gc time)
# level 4, 1st run on inlined columntable: 4.830689 seconds (4.07 M allocations: 196.590 MiB, 1.73% gc time, 49.77% compilation time)
# level 4, 2nd run on inlined columntable: 2.478992 seconds (827.06 k allocations: 22.226 MiB)
# level 4, 3rd run on inlined columntable: 2.294762 seconds (827.06 k allocations: 22.226 MiB)
# level 4, 4th run on inlined columntable: 2.247424 seconds (827.06 k allocations: 22.226 MiB)
# level 4, 5th run on inlined columntable: 2.424489 seconds (827.06 k allocations: 22.226 MiB)
Do you want to go ahead and submit that change as a PR to Tables.jl? You'll probably get more feedback that way. And if it lands, we get a free 100x in ArchGDAL!
Good to know that this PR would still improve the situation futher at least 5x. Though we'd have to weigh it against the added complexity of course, it's not a one line change that gives 100x. So let's start there?
@visr
Do you want to go ahead and submit that change as a PR to Tables.jl? You'll probably get more feedback that way. And if it lands, we get a free 100x in ArchGDAL!
I have made the PR https://github.com/JuliaData/Tables.jl/pull/275 to solve the compilation performance issue https://github.com/JuliaData/Tables.jl/issues/273 and another one seen during the investigation on column ordering when using dictcolumntable
https://github.com/JuliaData/Tables.jl/issues/274
Great, thanks! Though perhaps better to split it into two separate PR's since the two changes are unrelated. The inlining change is straightforward. For the dicts I guess the main question is do they want to take on OrderedCollections.jl as a new dependency.
Ok let's wait for some feedback from the @quinnj
Sorry for the slowness; I've been slammed w/ non-julia things lately. I'll try to dig in tonight.
It seems like a lot of work has gone into this PR. What's the current state? This shouldn't go to waste :)
Some of this is superceded by https://github.com/yeesian/ArchGDAL.jl/pull/316
There the enum integers are used in the type directly. Its simple and works. But we could also swap them to real types, that would be a minor change (but no performance improvement).
I tried to use this PR but it contains a lot of unrelated ideas. It should be cleanly separated into separate PRs
Ok, to me it looks as if #316 will be easier to merge and comes first. @mathieu17g can you somehow separate the ideas you had into separate PRs, which are more digestible on their own and hence easier to merge? E.g. one for table interface improvement, one for geometry stealing, one for better convert functions (if they can be separated like that). This can maybe be done best after #316 has been merged. Or do you have any objections against merging #316 and arguments for going further with your version (which looks more complicated after a superficial read)?
I think there's a lot of great ideas here (for not only improvements but also diagrams and benchmarking) that I'm happy to keep it as-is (without deleting or merging) and closing it only after we feel we've made progress on the items that we'd like to incorporate.
Here is a draft for review, demonstrating:
ogr_f_stealgeometry
for a gain of an additionnal 7% gain on layer to table conversion when there is only one geometry fieldgetfield
(to fix issue #246) which brings 5% performance gain to the latterPerformance results Current master
With Shapefile.jl
Current master with
ogr_f_stealgeometry
used ingetcolumn
With new OGR parametric types and with
ogr_f_stealgeometry
used ingetcolumn
1. ArchGDAL OGR related types moved to parametric composite types
Trying to dispatch on types required:
1.1 Creation of two parametric singleton types for fields and geomfields, and of a featuredefn like type alias
FType
for fields, which is paramaterized by a tuple ofOGRFieldType
andOGRFieldSubType
GType
for geomfields, which is parameterized by anOGRwkbGeometryType
FDType
for featuredefn, which is a 2-tuple of a tuple ofGType
and a tuple ofFType
Note 1: I have added convert functions from
FType
andDataType
which yields a more readable correspondance between Julia DataTypes and GDAL Field type ids, than the current separated convert functions fromOGRFieldType
andOGRFieldSubType
toDataType
Note 2: I have also added a commented draft of new
Geometry
parametric singleon type (namedGeom
) based onGType
but not used it since I have not identified any benefit1.2 Duplication of most of OGR related composite types and parameterization with
FType
,GType
andFDType
The duplicated types have the
FDP
prefixFDP_GeomFieldDefn
andFDP_FieldDefn
types are parameterized byGType
andFType
FDP_FeatureDefn
,FDP_Feature
andFDP_FeatureLayer
types are parameterized byFDType
All of them have been set as subtypes of corresponding abstract supertypes
DUAL_AbstractXXX
, in order to limit the duplication of functions code when not necessary.Note: A plain FeatureDefn parametric composite type could later be used as type parameter for Feature and FeatureLayer
1.3 Modification of a few functions, used to show a layer and convert it to a table
getFDPlayer
to handle a feature layer data source with all the newly introduced parametric composite typesFDType
information, have just been extended to acceptDUAL_AbstractXXX
FDType
information, have been duplicated as generated functionsNote: In the event of the unlikely use case where there would be too many generated functions (a lot of different data sources with different featuredefn) for compilation to be performed in a reasonable time or even to succeed, the generated functions could be moved to optionally-generated functions. The fallbacks being the legacy functions
1.4 Side additions
Enum
type id for user convenience. Disabled in the PR because it is not convenient when developping using revise.jl because it redefines the related constants. I have not found a way to inhibit the latterI have generalized the
ownedby
field in feature layer related types which may serve as a basis to fix issue #215, but @yeesian is tackling this one. For all intents and purposes here are below the relationships which are proposed:Ownership relations graph
destroy
functions forFDP_XXX
types have been created accordingly2. Steal geometry instead of cloning for the first geometry in features
There appears that
Tables.rows(layer)
makes a copy of the layer.Since
ogr_f_stealgeometry
is faster than the composition ofogr_f_getgeomfieldref
andogr_g_clone
, I have modifiedTables.getcolumn
to use astealgeom
function instead of the currentgetgeom
function.Unfortunately,
ogr_f_getgeomfieldref
is implemented in GDAL C interface only for the default (first) geometry. Datasources with only one geometry per feature is the general case, but that could be an issue to raise in GDAL.3. Further performance optimizations to investigate, regarding the layer to table conversion performance
FDType
to avoid callingogr_f_getfieldindex
Count the number of calls toEDIT: Done only one extra call toogr_l_getnextfeature
to ensure there is no unnecessary call to ititerate
compared to the number of features when converting a layer to table