johnnychen94 / LazyModules.jl

No, no, not now
MIT License
21 stars 2 forks source link

Eliminating overhead? #10

Open MilesCranmer opened 1 year ago

MilesCranmer commented 1 year ago

Very useful package, thanks for putting it up. I was wondering if there is a way to eliminate all overhead from lazy-loaded calls?

I am trying to conditionally load Zygote.jl in my package. Package extensions are not practical here because I don't want downstream users to have to load Zygote.jl. I want my package to install Zygote for them, but only load it when necessary: gradients are not commonly used, so it makes sense to have conditional loading.

But since I use Zygote.jl-generated gradients as kernels, the overhead of Base.invokelatest quickly accumulates. Is there any way I can eliminate this completely?

Since I don't expect the function to change after the first call of Base.invokelatest, perhaps there is a way to tell the compiler it is free to inline the latest call? I tried recording the world age manually, and then using Base.invoke_in_world, as well as Core._call_in_world but this sadly did not seem to improve things.

Perhaps this is not possible with Julia?

johnnychen94 commented 1 year ago

I've thought about this, too. I also asked if my colleague @thautwarm has any ideas on this. But it seems that eliminating the overhead would be impossible as long as we allow Julia's dynamic compilation feature.

A possible alternative idea with Julia 1.9's extensions could be building some known package that triggers >10 extensions (exchanging the role of the main package and the extension package 😆). But yet I don't know if it's even a good idea...

MilesCranmer commented 1 year ago

I wonder if one could define an internal module that triggers extensions to load. Something like:

function load_ext()
    Base.MainInclude.eval(:(using PrimaryModule._ExtensionLoader))
end

And have the internal module _ExtensionLoader trigger an extension to load within PrimaryModule and overload the relevant functions with an eagerly loaded module.

But I’m not sure if this would work as it assumes the user has loaded PrimaryModule, rather than having it as an indirect dependency…


Edit: nevermind, I guess this package would already be loaded when the user loads PrimaryModule…

MilesCranmer commented 1 year ago

cc @mkitti in case you have ideas

MilesCranmer commented 1 year ago

Wait, would something like the following work?

julia> module A
           using Requires: @init, @require
           function f()
               Base.require(@__MODULE__, :Zygote)
           end
           @init @require Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f" using Zygote
       end

julia> A.Zygote
ERROR: UndefVarError: `Zygote` not defined
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base ./Base.jl:31
 [2] top-level scope
   @ REPL[2]:1

julia> A.f()
Zygote

julia> A.Zygote.gradient
gradient (generic function with 1 method)
MilesCranmer commented 1 year ago

My god, I think it actually works. Here's a working example of lazily-loaded Zygote.jl with zero overhead on calls to Zygote.gradient: https://github.com/SymbolicML/DynamicExpressions.jl/blob/2e760980524e4424317bd9e194274e3e10381b3e/src/OperatorEnumConstruction.jl

Here are the relevant lines of the lazy loading part:

generate_diff_operators(::Any, ::Any) = error("`Zygote` not loaded.")
@init @require Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f" @eval begin
    include("zygote_interface.jl")
end

function OperatorEnum(; binary_operators, unary_operators, enable_autodiff=false)
    if enable_autodiff
        Base.require(@__MODULE__, :Zygote)
        Base.invokelatest(generate_diff_operators, binary_operators, unary_operators)
    end
end

Then, the contents of zygote_interface.jl:

import Zygote: gradient

function generate_diff_operators(
    binary_operators::Vector{Function}, unary_operators::Vector{Function}
)
    diff_bin = Function[]
    diff_una = Function[]

    for op in binary_operators
        diff_op(x, y) = gradient(op, x, y)
        push!(diff_bin, diff_op)
    end
    for op in unary_operators
        diff_op(x) = gradient(op, x)[1]
        push!(diff_una, diff_op)
    end
    return diff_bin, diff_una
end

We can see that Zygote.jl is not actually loaded at startup:

julia> @time_imports using DynamicExpressions
      1.2 ms  SuiteSparse
      3.4 ms  ArrayInterfaceCore
      1.0 ms  IfElse
     33.8 ms  Static
      3.2 ms  ArrayInterface
      5.7 ms  StaticArrayInterface
      1.3 ms  SIMDTypes
      2.5 ms  ManualMemory
      4.8 ms  LayoutPointers
      2.3 ms  CPUSummary
      1.3 ms  BitTwiddlingConvenienceFunctions
      9.1 ms  HostCPUFeatures
    184.0 ms  VectorizationBase
      3.8 ms  SLEEFPirates
      1.2 ms  UnPack
      1.0 ms  Adapt
     38.3 ms  OffsetArrays
      1.7 ms  StaticArrayInterface → StaticArrayInterfaceOffsetArraysExt
      7.7 ms  ThreadingUtilities
      7.2 ms  PolyesterWeave
      2.2 ms  DocStringExtensions
      5.7 ms  CloseOpenIntervals
    136.6 ms  LoopVectorization
     10.0 ms  MacroTools
    127.3 ms  DynamicExpressions 4.44% compilation time

generate_diff_operators still needs to be called with Base.invokelatest, but the actual expensive calls (Zygote.gradient) seem to be zero-overhead.

MilesCranmer commented 1 year ago

Ah, damn. It seems to get world age issues when I run it inside AirspeedVelocity.jl (which wraps inside a module). So I guess this doesn't fix it.

johnnychen94 commented 1 year ago

I tend to believe black-box function is the right appropriate abstraction here...: if you don't treat it as a black box you'll get surprised somewhere..

MilesCranmer commented 1 year ago

The weird thing is that this seems to work in most contexts. Even if I wrap it in a module manually and try executing it from the REPL; it still works, and appears to have zero overhead. It’s only when I run within AirspeedVelocity.jl do I get an error.

Perhaps it’s something to do with how AirspeedVelocity.jl imports the module twice: once at the top level, and once within a module for benchmarking…

mkitti commented 1 year ago

Maybe a macro might be helpful here. You could use the macro to load the package just before calling the function.

MilesCranmer commented 1 year ago

An internal macro or user-facing?

MilesCranmer commented 1 year ago

I’m wondering if there is a way one could simply manually trigger an extension to load. It doesn’t seem like it would require LLVM hacking; the extensions seem to be organized by some Julia code.