Open MilesCranmer opened 1 year ago
I've thought about this, too. I also asked if my colleague @thautwarm has any ideas on this. But it seems that eliminating the overhead would be impossible as long as we allow Julia's dynamic compilation feature.
A possible alternative idea with Julia 1.9's extensions could be building some known package that triggers >10 extensions (exchanging the role of the main package and the extension package 😆). But yet I don't know if it's even a good idea...
I wonder if one could define an internal module that triggers extensions to load. Something like:
function load_ext()
Base.MainInclude.eval(:(using PrimaryModule._ExtensionLoader))
end
And have the internal module _ExtensionLoader
trigger an extension to load within PrimaryModule
and overload the relevant functions with an eagerly loaded module.
But I’m not sure if this would work as it assumes the user has loaded PrimaryModule
, rather than having it as an indirect dependency…
Edit: nevermind, I guess this package would already be loaded when the user loads PrimaryModule
…
cc @mkitti in case you have ideas
Wait, would something like the following work?
julia> module A
using Requires: @init, @require
function f()
Base.require(@__MODULE__, :Zygote)
end
@init @require Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f" using Zygote
end
julia> A.Zygote
ERROR: UndefVarError: `Zygote` not defined
Stacktrace:
[1] getproperty(x::Module, f::Symbol)
@ Base ./Base.jl:31
[2] top-level scope
@ REPL[2]:1
julia> A.f()
Zygote
julia> A.Zygote.gradient
gradient (generic function with 1 method)
My god, I think it actually works. Here's a working example of lazily-loaded Zygote.jl with zero overhead on calls to Zygote.gradient
: https://github.com/SymbolicML/DynamicExpressions.jl/blob/2e760980524e4424317bd9e194274e3e10381b3e/src/OperatorEnumConstruction.jl
Here are the relevant lines of the lazy loading part:
generate_diff_operators(::Any, ::Any) = error("`Zygote` not loaded.")
@init @require Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f" @eval begin
include("zygote_interface.jl")
end
function OperatorEnum(; binary_operators, unary_operators, enable_autodiff=false)
if enable_autodiff
Base.require(@__MODULE__, :Zygote)
Base.invokelatest(generate_diff_operators, binary_operators, unary_operators)
end
end
Then, the contents of zygote_interface.jl
:
import Zygote: gradient
function generate_diff_operators(
binary_operators::Vector{Function}, unary_operators::Vector{Function}
)
diff_bin = Function[]
diff_una = Function[]
for op in binary_operators
diff_op(x, y) = gradient(op, x, y)
push!(diff_bin, diff_op)
end
for op in unary_operators
diff_op(x) = gradient(op, x)[1]
push!(diff_una, diff_op)
end
return diff_bin, diff_una
end
We can see that Zygote.jl is not actually loaded at startup:
julia> @time_imports using DynamicExpressions
1.2 ms SuiteSparse
3.4 ms ArrayInterfaceCore
1.0 ms IfElse
33.8 ms Static
3.2 ms ArrayInterface
5.7 ms StaticArrayInterface
1.3 ms SIMDTypes
2.5 ms ManualMemory
4.8 ms LayoutPointers
2.3 ms CPUSummary
1.3 ms BitTwiddlingConvenienceFunctions
9.1 ms HostCPUFeatures
184.0 ms VectorizationBase
3.8 ms SLEEFPirates
1.2 ms UnPack
1.0 ms Adapt
38.3 ms OffsetArrays
1.7 ms StaticArrayInterface → StaticArrayInterfaceOffsetArraysExt
7.7 ms ThreadingUtilities
7.2 ms PolyesterWeave
2.2 ms DocStringExtensions
5.7 ms CloseOpenIntervals
136.6 ms LoopVectorization
10.0 ms MacroTools
127.3 ms DynamicExpressions 4.44% compilation time
generate_diff_operators
still needs to be called with Base.invokelatest
, but the actual expensive calls (Zygote.gradient
) seem to be zero-overhead.
Ah, damn. It seems to get world age issues when I run it inside AirspeedVelocity.jl (which wraps inside a module). So I guess this doesn't fix it.
I tend to believe black-box function is the right appropriate abstraction here...: if you don't treat it as a black box you'll get surprised somewhere..
The weird thing is that this seems to work in most contexts. Even if I wrap it in a module manually and try executing it from the REPL; it still works, and appears to have zero overhead. It’s only when I run within AirspeedVelocity.jl do I get an error.
Perhaps it’s something to do with how AirspeedVelocity.jl imports the module twice: once at the top level, and once within a module for benchmarking…
Maybe a macro might be helpful here. You could use the macro to load the package just before calling the function.
An internal macro or user-facing?
I’m wondering if there is a way one could simply manually trigger an extension to load. It doesn’t seem like it would require LLVM hacking; the extensions seem to be organized by some Julia code.
Very useful package, thanks for putting it up. I was wondering if there is a way to eliminate all overhead from lazy-loaded calls?
I am trying to conditionally load Zygote.jl in my package. Package extensions are not practical here because I don't want downstream users to have to load Zygote.jl. I want my package to install Zygote for them, but only load it when necessary: gradients are not commonly used, so it makes sense to have conditional loading.
But since I use Zygote.jl-generated gradients as kernels, the overhead of
Base.invokelatest
quickly accumulates. Is there any way I can eliminate this completely?Since I don't expect the function to change after the first call of
Base.invokelatest
, perhaps there is a way to tell the compiler it is free to inline the latest call? I tried recording the world age manually, and then usingBase.invoke_in_world
, as well asCore._call_in_world
but this sadly did not seem to improve things.Perhaps this is not possible with Julia?