Improve performance with less specialization

timholy commented 4 years ago

~~This is "WIP" because it currently fails two tests (just for a handful of Core.Builltinand/or Core.Intrinsic functions, I think). I'd appreciate some help figuring out how to fix them.~~

~~This takes two main strategies, control over specialization and greater caching, to improve package performance.~~

Several functions have to operate on arguments that have a huge number of different types: any function that takes a function or a type as an argument might, in principle, be specialized, but then used only once or a small handful of times. This marks such arguments with atsign-nospecialize annotations, to prevent them from being recompiled many times over. Since this package has to use isa logic anyway, there is very little advantage in specialization and a big win in terms of compile time for avoiding it.

EDIT: the text below and much of the discussion below is no longer relevant to this PR (those issues will be resolved in a separate PR)

The other major strategy relies on caching. Here there are negatives, because we have to invalidate the cache if the LanguageServer has to process newly-added packages; I'm not sure if that ever happens, and if so where that invalidation should be done.

Of course, the advantage is performance. To major bottlenecks were:

Base._methods: the issue is that two modules might define methods for the same f (e.g., Base.convert might be extended by multiple packages). However, Base._methods looks up methods across the whole session; we might as well save the result.
isdefined(m, n) where n came from allnames. This was wasteful since it was iterating over all names defined throughout the entire session rather than just the names defined in a specific module.

With all that said, let's get to the numbers. Here was my test:

using SymbolServer
using Images, Profile, ProfileView  # Revise was also loaded
env = SymbolServer.getenvtree()
@time SymbolServer.symbols(env)

# And then run it a second time in the same session, just for fun (not sure this is relevant)
env = SymbolServer.getenvtree()
@time SymbolServer.symbols(env)

And here's the timing results:

Run #	Time, master	Time, this PR	Gain
1	89.9s	11.3s	8x
2	65.1s	2.2s	30x

As you can see, the gains are pretty decent. I did the timing on top of #159, but I don't think that really changes this test very much.

ZacLN commented 4 years ago

Amazing stuff, I haven't gone through the code in great detail but avoiding specialization seems a clear winner.

The other major strategy relies on caching. Here there are negatives, because we have to invalidate the cache if the LanguageServer has to process newly-added packages; I'm not sure if that ever happens, and if so where that invalidation should be done.

I'm not entirely clear from my scan of the PR at what level this invalidation happens - is this reference to extended methods from another module? Do you mean that the addition of a package to an environment invalidate caches for all packages?

isdefined(m, n) where n came from allnames. This was wasteful since it was iterating over all names defined throughout the entire session rather than just the names defined in a specific module.

This was a blunt force attempt to ensure that relevant objects are attached to the ModuleStore regardless of whether the name exists within the module, i.e. we want to attach the method defined below within the cache of A

module A 
Base.SomeNonExportedFunctionofBase() =1 
end

Similarly, we want Base methods for Core.Int to be stored within Base's cache (though this doesn't necessitate iterating over all names). There are more efficient approaches to this than the one current taken, but I think this PR stores all methods within the FunctionStore at the parent module? (Is this the link to invalidation mentioned above?)

I appreciate that none of this is covered in the tests...

timholy commented 4 years ago

I'm not entirely clear from my scan of the PR at what level this invalidation happens

Right now, it doesn't. What I meant to say (but didn't clearly spell out) was that if the LanguageServer has some persistence, and re-runs if the user adds packages to the environment, then to be safe the design here will require that you first empty the caches and start from scratch. But I guess you kind of do that now anyway now, because you presumably call allnames() from scratch at some point. It's just that with this design we'd have to intentionally insert an empty!(...) somewhere in the code path.

regardless of whether the name exists within the module

Ah, interesting! I still suspect that there may be a more efficient way to do it; for example, we could go through all the method tables and then parcel the methods, once, into their separate modules-of-definition. I recently wrote a package MethodAnalysis.jl that covers some of the same "traversal" territory of this package, so it can be used to check the timing of that kind of operation. Using the same set of packages I used above,

julia> using Images, Profile, ProfileView  # Revise was also loaded
Gtk-Message: 16:13:15.647: Failed to load module "canberra-gtk-module"
Gtk-Message: 16:13:15.647: Failed to load module "canberra-gtk-module"

julia> using MethodAnalysis

julia> function parcel_methods()
           methdict = Dict{Module,Vector{Method}}()
           covered = Base.IdSet{Any}()
           visit() do item
               if isa(item, Function) || isa(item, Type)
                   item ∈ covered && return false   # don't descend any further
                   push!(covered, item)
                   return true                      # descend into the methods of this function/type
               elseif isa(item, Method)
                   list = get!(Vector{Method}, methdict, item.module)
                   push!(list, item)
                   return false                     # don't descend into MethodInstances etc.
               else
                   return true                      # descend by default (e.g., into modules)
               end
           end
           return methdict
       end
parcel_methods (generic function with 1 method)

julia> precompile(parcel_methods, ())
true

julia> @time methdict = parcel_methods()
WARNING: BaseTraits.IsFastLinearIndex is deprecated, use IsIndexLinear instead.
  likely near REPL[5]:1
⋮        # ...many more...
  9.769300 seconds (12.39 M allocations: 637.746 MiB, 1.23% gc time)
Dict{Module,Array{Method,1}} with 255 entries:
  Xorg_libXdmcp_jll            => Method[__init__() in Xorg_libXdmcp_jll at /home/tim/.julia/packages/Xorg_libXdmcp_jll/A64dc/src/wrappers/x86_64-linux-gnu.jl:27, eval(x) in Xorg_libXdmcp_jll at /home/ti…
  Xorg_xkeyboard_config_jll    => Method[__init__() in Xorg_xkeyboard_config_jll at /home/tim/.julia/packages/Xorg_xkeyboard_config_jll/G33Bc/src/wrappers/x86_64-linux-gnu.jl:12, eval(x) in Xorg_xkeyboar…
  AbstractFFTs                 => Method[mul!(y::AbstractArray, p::AbstractFFTs.ScaledPlan, x::AbstractArray) in AbstractFFTs at /home/tim/.julia/packages/AbstractFFTs/mhQvY/src/definitions.jl:269, getin…
  Base.Ryu                     => Method[append_c_digits(count, digits, buf, pos) in Base.Ryu at ryu/utils.jl:258, append_d_digits(olength, digits, buf, pos, decchar) in Base.Ryu at ryu/utils.jl:227, app…
⋮        # ...many more...

That 9.8s is a pretty big chunk of the 11.3s I quote above. However, 9.8s + 2.2s (the amount of time needed once all the caches are loaded) is just 12s, so I think this makes sense.

EDIT: oops, actually

julia> function parcel_methods()
           methdict = Dict{Module,Vector{Method}}()
           visit() do item
               if isa(item, Method)
                   list = get!(Vector{Method}, methdict, item.module)
                   push!(list, item)
                   return false                     # don't descend into MethodInstances etc.
               else
                   return true
               end
           end
           return methdict
       end

is fine, as MethodAnalysis already keeps track of which functions have been visited. It's no faster, because the total time is dominated by _methods, but very simple.

timholy commented 4 years ago

I appreciate that none of this is covered in the tests...

Fixing that might make it easier to try out new implementations with confidence. I'm on my third rewrite of Revise (https://github.com/timholy/Revise.jl/pull/497), and the main reason I can do it with reasonable confidence is that its test suite is about 3700 loc. And my main worry is that I still don't have adequate coverage...

Anyway, I know that writing a big test suite is a pain, but for Revise I've gotten to the point where I basically never fix a bug without first contributing a test that exposes the bug. Sometimes the test takes 3x as long to write as fixing the bug, but my sense is that it has on balance been the right decision.

ZacLN commented 4 years ago

I still suspect that there may be a more efficient way to do it; for example, we could go through all the method tables and then parcel the methods, once, into their separate modules-of-definition.

Absolutely, this was the motivation behind constructing the module tree prior to caching the symbols.

Tests are eternally on my todo list but this may spur me on!

Anyway, I'll have a look at the BuiltIn issue

ZacLN commented 4 years ago

I think the BuiltIns issue is due to parentmodule having different behaviour for non-generics:

Core.bitcast === Core.Intrinsics.bitcast
parentmodule(Core.bitcast) === Core
parentmodule(Core.Intrinsics.bitcast) === Core
(:bitcast in names(Core, all = true)) === false
(:bitcast in names(Core.Intrinsics, all = true)) === true

timholy commented 4 years ago

Actually, now that I think about it a bit more, what would you think about using MethodAnalysis in this package? First, one can trivially show that the simple implementation above gets even the non-exported names:

julia> module A
       foo(x) = 1
       end
Main.A

julia> module B
       struct Foo end
       Main.A.foo(::Foo) = 2
       end
Main.B

julia> methdict = parcel_methods()
⋮ 

julia> methdict[Main.A]
[1] eval(x) in Main.A at REPL[3]:1
[2] foo(x) in Main.A at REPL[3]:2
[3] include(x) in Main.A at REPL[3]:1

julia> methdict[Main.B]
[1] foo(::Main.B.Foo) in Main.B at REPL[4]:3
[2] Main.B.Foo() in Main.B at REPL[4]:2
[3] eval(x) in Main.B at REPL[4]:1
[4] include(x) in Main.B at REPL[4]:1

More importantly, could you use its traversal of Core.MethodInstances to help improve your location info? I'm thinking of the problem of how you get to the right callee when you hit F12 at a call statement inside a method. Perhaps the most general way to figure it out would be to run type inference on the method's signature and use it to figure out who the callee is; however, in some circumstances that could be quite expensive. A cheaper alternative might be to check and see if inference has already been run for specific argument types. Here's a demo:

julia> methods(findnext)
# 14 methods for generic function "findnext":
[1] findnext(pat::Base.RegexAndMatchData, str, i) in Base at regex.jl:460
[2] findnext(re::Regex, str::Union{String, SubString}, idx::Integer) in Base at regex.jl:299
[3] findnext(r::Regex, s::AbstractString, idx::Integer) in Base at regex.jl:324
[4] findnext(pred::Base.Fix2{var"#s92",Bool} where var"#s92"<:Union{typeof(==), typeof(isequal)}, B::BitArray, start::Integer) in Base at bitarray.jl:1444
[5] findnext(testf::Function, B::BitArray, start::Integer) in Base at bitarray.jl:1454
[6] findnext(pred::Base.Fix2{var"#s92",var"#s91"} where var"#s91"<:AbstractChar where var"#s92"<:Union{typeof(==), typeof(isequal)}, s::String, i::Integer) in Base at strings/search.jl:5
[7] findnext(pred::Base.Fix2{var"#s92",var"#s91"} where var"#s91"<:Union{Int8, UInt8} where var"#s92"<:Union{typeof(==), typeof(isequal)}, a::Union{Array{Int8,1}, Array{UInt8,1}}, i::Integer) in Base at strings/search.jl:25
[8] findnext(testf::Function, s::AbstractString, i::Integer) in Base at strings/search.jl:127
[9] findnext(f::Function, v::SparseArrays.AbstractSparseArray, i) in SparseArrays at /home/tim/src/julia-master/usr/share/julia/stdlib/v1.6/SparseArrays/src/abstractsparse.jl:79
[10] findnext(testf::Function, A, start) in Base at array.jl:1776
[11] findnext(B::BitArray, start::Integer) in Base at bitarray.jl:1402
[12] findnext(t::AbstractString, s::AbstractString, i::Integer) in Base at strings/search.jl:277
[13] findnext(ch::AbstractChar, string::AbstractString, ind::Integer) in Base at strings/search.jl:296
[14] findnext(A, start) in Base at array.jl:1694

julia> m = methods(findnext).ms[10]
findnext(testf::Function, A, start) in Base at array.jl:1776

julia> using MethodAnalysis

julia> mis = Core.MethodInstance[]
Core.MethodInstance[]

julia> visit(m) do item
           isa(item, Core.MethodInstance) && (push!(mis, item); return false)
           return true
       end

julia> mis
15-element Array{Core.MethodInstance,1}:
 MethodInstance for findnext(::Function, ::Array{Any,1}, ::Int64)
 MethodInstance for findnext(::Base.Fix2{typeof(==),UnionAll}, ::Array{Any,1}, ::Int64)
 MethodInstance for findnext(::Base.var"#62#63"{typeof(isempty)}, ::Array{Pkg.Types.VersionRange,1}, ::Int64)
 MethodInstance for findnext(::Base.var"#408#410", ::Array{Any,1}, ::Int64)
 MethodInstance for findnext(::Sockets.var"#1#2", ::Array{Sockets.IPv4,1}, ::Int64)
 MethodInstance for findnext(::Pkg.Operations.var"#9#10"{Base.UUID}, ::Array{Pkg.Types.PackageSpec,1}, ::Int64)
 MethodInstance for findnext(::Pkg.Operations.var"#6#7"{Base.UUID}, ::Array{Pkg.Types.PackageSpec,1}, ::Int64)
 MethodInstance for findnext(::Pkg.Operations.var"#23#24"{Base.UUID}, ::Array{Pkg.Types.PackageSpec,1}, ::Int64)
 MethodInstance for findnext(::Pkg.Operations.var"#29#34"{Base.UUID}, ::Array{Pkg.Types.PackageSpec,1}, ::Int64)
 MethodInstance for findnext(::typeof(REPL.LineEdit._notspace), ::Array{UInt8,1}, ::Int64)
 MethodInstance for findnext(::Pkg.Operations.var"#113#119"{Nothing}, ::Array{Pkg.Types.PackageSpec,1}, ::Int64)
 MethodInstance for findnext(::Pkg.Operations.var"#113#119"{Base.UUID}, ::Array{Pkg.Types.PackageSpec,1}, ::Int64)
 MethodInstance for findnext(::Base.StackTraces.var"#1#2", ::Array{Base.StackTraces.StackFrame,1}, ::Int64)
 MethodInstance for findnext(::Base.Fix2{typeof(isequal),String}, ::Array{String,1}, ::Int64)
 MethodInstance for findnext(::Pkg.Operations.var"#113#119"{_A} where _A, ::Array{Pkg.Types.PackageSpec,1}, ::Int64)

This is a list of all the cases in which this method has already been called & compiled. For each of these, the inferred code is trivially (and very quickly) available:

julia> mi = mis[2]
MethodInstance for findnext(::Base.Fix2{typeof(==),UnionAll}, ::Array{Any,1}, ::Int64)

julia> @time src = Base._uncompressed_ir(mi.def, mi.cache.inferred)
  0.000048 seconds (134 allocations: 7.953 KiB)
CodeInfo(
    @ array.jl:1777 within `findnext'
   ┌ @ abstractarray.jl:102 within `keys'
   │┌ @ indices.jl:453 within `LinearIndices'
   ││┌ @ abstractarray.jl:75 within `axes'
   │││┌ @ array.jl:155 within `size'
1 ─││││       Base.arraysize(A, 1)::Int64
│  └└└└
│   @ array.jl:1778 within `findnext'
│  ┌ @ abstractarray.jl:102 within `keys'
│  │┌ @ indices.jl:453 within `LinearIndices'
│  ││┌ @ abstractarray.jl:75 within `axes'
│  │││┌ @ array.jl:155 within `size'
│  ││││ %2  = Base.arraysize(A, 1)::Int64
│  │││└
│  │││┌ @ tuple.jl:157 within `map'
│  ││││┌ @ range.jl:326 within `OneTo' @ range.jl:317
│  │││││┌ @ promotion.jl:409 within `max'
│  ││││││┌ @ int.jl:83 within `<'
│  │││││││ %3  = Base.slt_int(%2, 0)::Bool
│  ││││││└
│  ││││││ %4  = Base.ifelse(%3, 0, %2)::Int64
│  └└└└└└
│  ┌ @ indices.jl:489 within `last'
│  │┌ @ abstractarray.jl:95 within `axes1'
│  ││┌ @ abstractarray.jl:75 within `axes'
│  │││┌ @ tuple.jl:157 within `map'
│  ││││┌ @ range.jl:326 within `OneTo' @ range.jl:317
│  │││││┌ @ promotion.jl:409 within `max'
│  ││││││┌ @ int.jl:83 within `<'
│  │││││││ %5  = Base.slt_int(%4, 0)::Bool
│  ││││││└
│  ││││││ %6  = Base.ifelse(%5, 0, %4)::Int64
│  └└└└└└
│   @ array.jl:1779 within `findnext'
│  ┌ @ operators.jl:303 within `>'
│  │┌ @ int.jl:83 within `<'
│  ││ %7  = Base.slt_int(%6, start)::Bool
│  └└
└──       goto #3 if not %7
2 ─       return Base.nothing
    @ array.jl:1780 within `findnext'
3 ┄ %10 = φ (#1 => _4, #8 => %19)::Int64
└──       goto #9 if not true
    @ array.jl:1781 within `findnext'
   ┌ @ array.jl:811 within `getindex'
4 ─│ %12 = Base.arrayref(true, A, %10)::Any
│  └
│   %13 = (testf)(%12)::Any
└──       goto #6 if not %13
5 ─       return %10
    @ array.jl:1782 within `findnext'
   ┌ @ promotion.jl:398 within `=='
6 ─│ %16 = (%10 === %6)::Bool
│  └
└──       goto #8 if not %16
7 ─       goto #9
    @ array.jl:1784 within `findnext'
   ┌ @ abstractarray.jl:150 within `nextind'
   │┌ @ int.jl:87 within `+'
8 ─││ %19 = Base.add_int(%10, 1)::Int64
│  └└
└──       goto #3
    @ array.jl:1786 within `findnext'
9 ┄       return Base.nothing
)

It seems like it should be possible to use this to get hints about which specific methods of a callee get used in practice. Perhaps by comparing against multiple compiled instances you could get a prioritized list of candidates? Or you could have a UI tool that lets you select which of these MethodInstances you want to use for finding the correct callee?

Even if you can solve the analysis problem, there will be some pain points. For example, the implementation of the above method is

function findnext(testf::Function, A, start)
    i = oftype(first(keys(A)), start)
    l = last(keys(A))
    i > l && return nothing
    while true
        testf(A[i]) && return i
        i == l && break
        # nextind(A, l) can throw/overflow
        i = nextind(A, i)
    end
    return nothing
end

and of the calls that appear in the source, you'll see the inferred code above links to keys, last, and nextind but leaves no breadcrumbs about first and oftype. This could be viewed as a Julia bug and something we might fix ourselves or at least put on the queue of people who know more about this than I do.

So another alternative might be to find them via their backedges. That's pretty quick too:

julia> findcallee(callee, mi) = Dict(reverse.(direct_backedges(callee)))[mi]
findcallee (generic function with 1 method)

julia> using BenchmarkTools

julia> @btime findcallee($first, $mi)
  830.165 μs (8092 allocations: 419.84 KiB)
MethodInstance for first(::LinearIndices{1,Tuple{Base.OneTo{Int64}}})

All you need for this are a list of functions that get called by the method, and a specific MethodInstance to resolve the dispatch. Actually, come to think of it you can do better than creating a dict and then extracting a single element:

julia> function findcallee2(callee, mi)
           local meth
           found = false
           visit(callee) do item
                 if isa(item, Method) && !found
                   meth = item
                   visit(meth) do methmi
                       if isa(methmi, Core.MethodInstance)
                           visit_backedges(methmi) do edge
                               edge === mi && (found = true)
                               edge === methmi  # follow back only one caller
                           end
                           false
                       else
                           true
                       end
                   end
               end
               item === callee
           end
           found ? meth : nothing
       end
findcallee2 (generic function with 1 method)

julia> @btime findcallee2($first, $mi)
  253.462 μs (1595 allocations: 198.83 KiB)
first(iter::LinearIndices{1,R} where R<:Tuple{AbstractUnitRange{Int64}}) in Base at indices.jl:487

aviatesk commented 4 years ago

sorry for jump in,

Right now, it doesn't. What I meant to say (but didn't clearly spell out) was that if the LanguageServer has some persistence, and re-runs if the user adds packages to the environment, then to be safe the design here will require that you first empty the caches and start from scratch.

Does this package really run user package code (in some julia session) ? I thought everything that LS tries to do is to collect information from static parsing, and so any kind of invalidation (, that is, a method body loaded into a running session gets staled when some methods in its backedges get updated) doesn't happen, no ?

More importantly, could you use its traversal of Core.MethodInstances to help improve your location info?

So I guess MethodInstances here will only be generated from types that can be got from (literally) function signatures. Will that help ?

ZacLN commented 4 years ago

We don't really have access to methods in this way. I'm not sure how much background you have on the general approach we take so I'll try to summarise it briefly.

The key point to make is that we don't actually compile any user code (as it appears in workspace documents within the editor). Everything that is 'live' in this way is represented as CST with annotations for bindings, scopes and the references of symbols. This makes no use of the Julia lowerer/compiler, etc. (A future aim is to refactor the CST structure so as to make it more directly substitutable for Julia's AST allowing us to use some subset of those Julia internals). The SymbolServer cache then acts as the editor sessions snapshot view of the outside world - all other code within the active environment that is not 'live'.

Continuing with the example above, in terms of what we could do to narrow down which particular method is being called in any given instance we're far more restricted. We have a pseudo inference pass that runs on user level bindings but nothing at the arbitrary expression level.

Does that make sense?

ZacLN commented 4 years ago

Does this package really run user package code (in some julia session) ?

Yes, to create the caches the packages are loaded. This in done in a separate process to maintain that separation from the editor process

pfitzseb commented 4 years ago

It's basically the same approach as the DocSeeker.jl cache, except that it pulls in more than just docstrings, @aviatesk.

davidanthoff commented 4 years ago

Actually, now that I think about it a bit more, what would you think about using MethodAnalysis in this package?

So one complication is that we don't really want to load any packages in the symbol server process for ourselves, because as soon as we do we restrict in some way what packages/version combos a user might use. The general philosophy here is that only packages from the users environment get loaded in the symbol server process.

timholy commented 4 years ago

We don't really have access to methods in this way...

That makes sense. For people running in vscode or Atom's terminals, you could have a small server that could send serializations of MethodInstances, but I can understand why that might not be entirely desirable.

we don't really want to load any packages in the symbol server process

I'll finish this off sometime without MethodAnalysis. Next couple of days are pretty busy, but I should get to it by the end of the week. Feel free to run with it yourself, of course.

timholy commented 4 years ago

I think that before the caching part of this can be addressed, we should discuss #161. So for now I've dropped that from this PR and submitted just the @nospecialize annotations. Those are helpful though nowhere near as important as the caching:

Run # (each with a fresh `env`)	Time, master	Time, nospec
1	89.9s	77.3s
2	65.1s	68.8s

Interesting that it seems to be a bit slower on the second run. But I'm not sure that's important in practice?

I also threw in more whitespace changes and some tweaks to the manual. Let me know if you'd prefer those should be in a separate PR.

davidanthoff commented 4 years ago

I also threw in more whitespace changes

We are now using the auto formatting github actions, but we only turned it on after you opened this PR. So going forward, whatever whitespacing we do should hopefully be automatically made uniform by the bot :)

I'm going to hold this PR for a few days because we are in release phase right now and need to keep master clean from any important changes.

I think improving these caching times is great, but just be aware that we have plans to move the whole caching into the cloud, at which point the indexing performance will matter a lot less for the end-user experience :)

timholy commented 4 years ago

I'm going to hold this PR for a few days because we are in release phase right now and need to keep master clean from any important changes.

Totally understood. But to make sure it's clear, now the only thing this PR does is add @nospecialize annotations--nothing about the actual functionality is changed. (Well, I guess I also assert that Base.Docs.meta returns an IdDict{Any,Any}(). But that seems to be true for all Julia versions, while inference only knows that for nightly.)

I think improving these caching times is great, but just be aware that we have plans to move the whole caching into the cloud, at which point the indexing performance will matter a lot less for the end-user experience :)

That will be great! I presume it will still useful to have good performance for devved packages?

ZacLN commented 4 years ago

I presume it will still useful to have good performance for devved packages?

Absolutely

davidanthoff commented 4 years ago

Sorry, I should have said that we are keeping master blocked from any change right now, small or large, unless it is a super critical bug fix :) But not much longer, I think we'll open up for merging later today.

aviatesk commented 4 years ago

Let's merge this PR for now ? The conflicts needs to be resolved though.

davidanthoff commented 4 years ago

Yes, now would be a good time to merge. Anytime before 7/1, really :)

timholy commented 4 years ago

Should be ready now.

ZacLN commented 4 years ago

Timely, I've just made a PR with the earlier aspects of this PR. Let's merge this first if possible then I'll fix resulting conflicts over there.

timholy commented 4 years ago

Thanks. I can't merge, but perhaps someone can do the honors.

julia-vscode / SymbolServer.jl

Improve performance with less specialization #160