yeesian / ArchGDAL.jl

A high level API for GDAL - Geospatial Data Abstraction Library
https://yeesian.github.io/ArchGDAL.jl/stable/
Other
141 stars 27 forks source link

Speedup conversion from types `<: DataType` or `<: UnionAll` to `CEnum.Cenum` or `Enum` type ids #262

Open mathieu17g opened 2 years ago

mathieu17g commented 2 years ago

@yeesian and @visr, I may have found a way to speedup conversion from OGRFieldType and OGRFieldSubType to DataType x200 times. This could notably speedup layer to table conversions.

In order to move the Dict lookup from runtime to compile time, we can create a real OGRFieldType instead of using directly a type id as we do currently, OGRFieldType being an Enum This would allow to use generated convert functions and simplify @convert macro by the way

I have added the following code mockup to types.jl:

struct Real_OGRFieldType{OGRFieldType} end
Real_OGRFieldType2DataType = Dict(
    Real_OGRFieldType{OFTInteger} => Int32,
    # ...
)
@generated function convert(
    ::Type{DataType},
    real_oft::Type{T},
) where {T<:Real_OGRFieldType}
    return :($(get(Real_OGRFieldType2DataType, T, nothing)))
end

nothing could be replaced to throw error when lookup fails

This gives:

julia> using ArchGDAL; const AG=ArchGDAL
ArchGDAL

julia> using BenchmarkTools

julia> convert(DataType, AG.Real_OGRFieldType{AG.OFTInteger})
Int32

julia> @code_lowered convert(DataType, AG.Real_OGRFieldType{AG.OFTInteger})
CodeInfo(
    @ /Users/Mathieu/.../dev/ArchGDAL/src/types.jl:299 within `convert'
   ┌ @ /Users/Mathieu/.../dev/ArchGDAL/src/types.jl within `macro expansion'
1 ─│     return Int32
   └
)

julia> @benchmark convert(DataType, AG.Real_OGRFieldType{AG.OFTInteger})
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  0.026 ns … 0.050 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     0.037 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   0.037 ns ± 0.002 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                           █   ▁             
  ▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▂▁▁▄▁▁▁▅▁▁▁▂▁▁▁▂▁▁▁▂▁▁▃▁▁▁█▁▁▁█▁▁▁▃▁▁▁▃▁▁▂ ▂
  0.026 ns       Histogram: frequency by time      0.041 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> convert(DataType, AG.OFTInteger)
Int32

julia> @code_lowered convert(DataType, AG.OFTInteger)
CodeInfo(
1 ─ %1 = Base.getindex(ArchGDAL.OGRFieldType_to_DataType_map, ft)
└──      return %1
)

julia> @benchmark convert(DataType, AG.OFTInteger)
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  8.242 ns … 58.932 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     8.314 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.754 ns ±  1.666 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▆▃▃ ▃▃  ▁     ▄                                           ▁
  ████▅██▄▄█▁▁▃▁▅█▁▁▃▃▁▁▃▁▁▁▄▁▁▃▁▁▁▁▃▁▁▃▄▁▁▁▁▁▁▁▁▁▁▁▃▁▅▅▆▆▇▇ █
  8.24 ns      Histogram: log(frequency) by time     18.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> 
yeesian commented 2 years ago

That sounds like a proposition to me with almost no downsides, I'm curious what led to your choice of types (compared with e.g. https://github.com/yeesian/ArchGDAL.jl/issues/246)? Is it for the machinery of generated functions?

mathieu17g commented 2 years ago

Well here is what I did:

  1. Examined @code_lowered output
  2. => need to get the same as the reverse conversion convert(::OGRFieldType, ::Int32)
  3. => need to have OGRFieldType as a real type to benefit from the Julia shortcut convert(::Type{Type}, x::Type) = x at https://github.com/JuliaLang/julia/blob/ae8452a9e0b973991c30f27beb2201db1b0ea0d3/base/essentials.jl#L206 , as in the reverse conversion
  4. Tried to use Val(oft) and dispatch on a subtype of something like struct Val{OGRFieldType} end => did not work
  5. Mimicking what has been done with IGeometry, I tried with struct NewOGRFieldType{OGRFieldType} end => it worked :-)

The @generated macro, which only works on types, could do the same as the @convert macro while being more concise. And therefore could be placed in types.jl file next to each conversion pairs collections.

Regarding the case of issue #246, maybe we could make getfield(::Feature, ::Int) specialize, if we had types looking like Feature{FD} where FD <: FeatureDefn{NT} where NT <: NamedTuple. Testing would tell.

mathieu17g commented 2 years ago

While raising this issue, I had not in mind the usage of convert(DataType, oft::OGRFielType), which I guess may only be a the end of asint, ..., asdatetime functions. I'm not even sure it is called there...

In those functions, the convert time must be less than 20% of the total time spent in getfield. We should then first address the issue #246, shouldn't we ? If so and if @yeesian, you have not yet started to work on it, shall I auto assign #246 to myself ?

EDIT 1: Looking at the output of @code_warntype below, it seems that convert(DataType, oft::OGRFielType) is not used asint, ..., asdatetime functions. I going to go on looking for where these convert(DataType, oft::OGRFielType) functions are used, to balance to potential gain with the one of fixing #246

julia> AG.read("test/data/point.geojson") do ds
           AG.getlayer(ds) do layer
               AG.getfeature(layer, 0) do f
                   @code_warntype GDAL.ogr_f_getfieldasstring(f.ptr, 1)
               end
           end
       end
Variables
  #self#::Core.Const(GDAL.ogr_f_getfieldasstring)
  arg1::Ptr{Nothing}
  arg2::Int64

Body::Union{Nothing, String}
1 ─ %1 = Base.cconvert(GDAL.OGRFeatureH, arg1)::Ptr{Nothing}
│   %2 = Base.cconvert(GDAL.Cint, arg2)::Int32
│   %3 = Base.unsafe_convert(GDAL.OGRFeatureH, %1)::Ptr{Nothing}
│   %4 = Base.unsafe_convert(GDAL.Cint, %2)::Int32
│   %5 = $(Expr(:foreigncall, :(Core.tuple(:OGR_F_GetFieldAsString, GDAL.libgdal)), Cstring, svec(Ptr{Nothing}, Int32), 0, :(:ccall), :(%3), :(%4), :(%2), :(%1)))::Cstring
│   %6 = GDAL.aftercare(%5, false)::Union{Nothing, String}
└──      return %6

julia> AG.read("test/data/point.geojson") do ds
           AG.getlayer(ds) do layer
               AG.getfeature(layer, 0) do f
                   @code_warntype AG.asstring(f, 1)
               end
           end
       end
Variables
  #self#::Core.Const(ArchGDAL.asstring)
  feature::ArchGDAL.Feature
  i::Int64

Body::String
1 ─ %1 = ArchGDAL.String::Core.Const(String)
│   %2 = GDAL.ogr_f_getfieldasstring::Core.Const(GDAL.ogr_f_getfieldasstring)
│   %3 = Base.getproperty(feature, :ptr)::Ptr{Nothing}
│   %4 = (%2)(%3, i)::Union{Nothing, String}
│   %5 = Base.convert(%1, %4)::String
│   %6 = Core.typeassert(%5, %1)::String
└──      return %6

EDIT 2: I have changed this line in utils.jl in @convert macro with:

(eval(T2) isa Type{DataType}) && (eval(T1) <: OGRFieldType) || push!(

to drop the convert(DataType, oft::OGRFielType) functions.

Running Pkg.test("ArchGDAL"), reveals that only specific unit tests are failing. https://github.com/yeesian/ArchGDAL.jl/blob/17b7e7d1200a6aead977a0e74ebd6e70d2283111/test/test_convert.jl#L64-L65 https://github.com/yeesian/ArchGDAL.jl/blob/17b7e7d1200a6aead977a0e74ebd6e70d2283111/test/test_types.jl#L24 https://github.com/yeesian/ArchGDAL.jl/blob/17b7e7d1200a6aead977a0e74ebd6e70d2283111/test/test_types.jl#L29 So these conversion functions may be useless. The move of the dictionary lookup from runtime to compile time might be useful for some other conversions, but I propose to tackle the issue #246 first

yeesian commented 2 years ago

So sorry for my late response, I was dealing with a lot recently. I have not started working on #246 and have assigned it to you instead, thank you!