oxinabox / DataDeps.jl

reproducible data setup for reproducible science
Other
151 stars 43 forks source link

Tweak type inference to improve TTFX #165

Closed Omar-Elrefaei closed 1 year ago

Omar-Elrefaei commented 1 year ago

I was digging through the dependencies of Dynare looking for low hanging fruit to improve our ttfx.

this PR seems to save 1.2s for one of our dependencies PATHsolver because they use DataDeps in their __init__.

here is a small sample test and the timing results of the first run in a fresh session.

module testMod
using DataDeps
function test()
    ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"
    register(DataDep("name", "msg", "www.example.com", Any))
    DataDeps.datadep"name"
end
end

on Master:

julia> include("testMod.jl")

julia> @time @eval testMod.test()
  1.700212 seconds (2.80 M allocations: 180.028 MiB, 7.15% gc time, 99.44% compilation time)

with the first commit (removing type specification):

julia> include("testMod.jl")

julia> @time @eval testMod.test()
  1.015055 seconds (1.56 M allocations: 100.482 MiB, 9.25% gc time, 99.05% compilation time)

with the both commits:

julia> include("testMod.jl")

julia> @time @eval testMod.test()
  0.504765 seconds (551.64 k allocations: 36.326 MiB, 15.40% gc time, 99.93% compilation time)

and for sake of completeness, with a simple PercompileTools block containing the same 3 lines:

julia> @time @eval testMod.test()
  0.020395 seconds (21.12 k allocations: 1.439 MiB, 98.18% compilation time)

But I'm not sure if it is ok to have code that runs on the first using DataDeps that hits the network and writes to disk.

Note: I'm using @time @eval because @time seems to ignore time that was spent on things like inference. @time @eval closely matches the actual wall-time.

oxinabox commented 1 year ago

Getting rid of the type parameters makese sense to me. Compiling optimized code when data download is going to utterly dwarf anything else is too much.

Our tests have kind of bit-rotted.