Methods of a single function are missing or scattered across modules

timholy commented 4 years ago

Issue demonstration

In preparing to revise #160 a bit, I was trying to understand the logic a bit more. I created a fresh package depot with these "package" definitions:

A.jl:

module A

foo(x) = 1

end # module

B.jl:

module B

import A

struct Foo end
A.foo(::Foo) = 2

end # module

Each has its own Project.toml file and the standard package directory structure, with B's project file listing A in its [deps] section. Then if I cache symbols, I get this:

julia> env[:A][:foo].methods
1-element Array{SymbolServer.MethodStore,1}:
 SymbolServer.MethodStore(:foo, :A, "/tmp/pkgs/dev/A/src/A.jl", 3, Pair{Any,Any}[:x => SymbolServer.FakeTypeName(SymbolServer.VarRef(SymbolServer.VarRef(nothing, :Core), :Any), Any[])], Symbol[], SymbolServer.FakeTypeName(SymbolServer.VarRef(SymbolServer.VarRef(nothing, :Core), :Any), Any[]))

You'll notice there is only one method listed in the MethodStore, despite the fact that

julia> methods(A.foo)
# 2 methods for generic function "foo":
[1] foo(::B.Foo) in B at /tmp/pkgs/dev/B/src/B.jl:6
[2] foo(x) in A at /tmp/pkgs/dev/A/src/A.jl:3

Is it stored in B? To make sure I don't miss it (in case there's name-mangling to encode A.foo), let's look for "foo" anywhere in the name:

julia> ks = String.(keys(env[:B].vals));

julia> filter(s->occursin("foo", s), ks)
0-element Array{String,1}

Doesn't seem to be there.

If I change the definition of B.jl to

module B

import A: foo

struct Foo end
foo(::Foo) = 2

end # module

and start a fresh session and build the cache again, I get

julia> env[:A][:foo].methods
1-element Array{SymbolServer.MethodStore,1}:
 SymbolServer.MethodStore(:foo, :A, "/tmp/pkgs/dev/A/src/A.jl", 3, Pair{Any,Any}[:x => SymbolServer.FakeTypeName(SymbolServer.VarRef(SymbolServer.VarRef(nothing, :Core), :Any), Any[])], Symbol[], SymbolServer.FakeTypeName(SymbolServer.VarRef(SymbolServer.VarRef(nothing, :Core), :Any), Any[]))

julia> env[:B][:foo].methods
1-element Array{SymbolServer.MethodStore,1}:
 SymbolServer.MethodStore(:foo, :B, "/tmp/pkgs/dev/B/src/B.jl", 6, Pair{Any,Any}[Symbol("#unused#") => SymbolServer.FakeTypeName(SymbolServer.VarRef(SymbolServer.VarRef(nothing, :B), :Foo), Any[])], Symbol[], SymbolServer.FakeTypeName(SymbolServer.VarRef(SymbolServer.VarRef(nothing, :Core), :Any), Any[]))

That's better---it got both methods---but if you see a foo(3) in B then

julia> B.foo(3)
1

which demonstrates that it called the method defined in A, not the one defined in B. Meaning that all methods are available from any module that can access it. Might this cause a problem? To find out, I added

function bar()
    return foo(convert(Int, 3.0))
end

to B. Then I positioned my cursor somewhere in foo and hit F12, which takes me to foo(::Foo) rather than foo(::Any), and it doesn't give me a choice in a popup dialog. In contrast, positioning my cursor inside convert and hitting F12 takes me to a definition (the wrong one, but that's a separate issue) but gives me a dialog allowing me to manually choose other methods.

Proposed fix

Fixing this seems to require a bit of an architecture change. To me it seems that the most consistent representation would be to lump all methods for A.foo together in a single FunctionStore placed in A, rather than parcel them out to the modules in which they are defined. This is consistent with how Julia does it:

julia> B.foo === A.foo
true

julia> ft = typeof(A.foo)
typeof(A.foo)

julia> ftn = ft.name
typeof(A.foo)

julia> ftn.module     # A "owns" foo...
A

julia> m = methods(A.foo).ms[1]
foo(::B.Foo) in B at /tmp/pkgs/dev/B/src/B.jl:6

julia> m.module    # ...even though there are methods for it in B
B

When foo requires module-scoping for use in B (my first definition of B.jl), then I'd propose that foo should not be a recorded symbol in env[:B]. (Ideally, hitting F12 in a call A.foo(x) would grok the explicit module-scoping and find it in A.) In the second definition of B.jl, where B imported A.foo, I'd propose that env[:B][:foo] should return just VarRef(VarRef(A), :foo), and that all methods of A.foo be looked up in env[:A].

davidanthoff commented 4 years ago

We save this information by package in our cache. I think that is why each method is stored in the module where it is defined. If the user now uses a new environment where only package A is part of the env, we wouldn't want to load the method that was defined in B at all. If the user loads a new env with package A and C, where C adds yet another method to A.foo, we still want to be able to use the cached values for A.

Or another way to say this: we want to be able to load arbitrary different packages from our cache and combine them in such a way that they represent a given environment that the user might be using.

timholy commented 4 years ago

If the user now uses a new environment where only package A is part of the env, we wouldn't want to load the method that was defined in B at all.

That's reasonable, though you can look for unloaded modules and just exclude those methods at the time you're using it.

But even if you decide to store the methods in their host packages, there's still at least one bug here.

davidanthoff commented 4 years ago

If the user now uses a new environment where only package A is part of the env, we wouldn't want to load the method that was defined in B at all.

That's reasonable, though you can look for unloaded modules and just exclude those methods at the time you're using it.

But we would always need to store it also in B (or C or D) because we might have a situation where package A gets indexed and cached while the user loads an env that includes package A and C. Then the user uses an env that includes packages A and B, and in that case we won't re-index package A, but load the cached version. But that cached version of A now won't include the method for foo that is defined in B, so it needs to be stored in the cache for B to be loaded properly. And if we have to store that method in B in any case, we might as well only store it there :)

there's still at least one bug here

Yes, certainly!

ZacLN commented 4 years ago

Alongside David's point we also avoid having to recahce all dependencies of a package each time there's a change to the package where a function lives.

On the bug - I think we just need to visit each function once, only from its parent module, and then cache the methods to their respective modules tires at that point. This would also remove the unneeded repeated calls to methods (I think)

timholy commented 4 years ago

I see your point. In principle you can do one giant cache for the whole ecosystem, but presumably you have different caches for different versions of a package? It will indeed be better to store in the packages.

On the bug - I think we just need to visit each function once, only from its parent module, and then cache the methods to their respective modules tires at that point. This would also remove the unneeded repeated calls to methods (I think)

Yep, that's essentially what the original version of #160 did. As you say, a simpler way to do it is just keep an IdSet of all the functions you've visited and skip if you've already visited them. I can submit that if you want, but if you know what to do, go for it!

davidanthoff commented 4 years ago

presumably you have different caches for different versions of a package?

Yes, we have one cache file per version per package.

And the goal in the future is to just create these cache files in the cloud and then download them on demand as needed. They seem really quite small, so I'm fairly optimistic that we can get rid of the vast majority of indexing time that way.

timholy commented 4 years ago

It's more than just indexing time, though. If you can't have any "backward edges" in the graph (a link from A to B indicating that A.foo gets extendedby B), then when the user hits F12 you have to iterate over all :foo FunctionStores in all loaded modules.

Or alternatively, add that field to FunctionStore:

struct FunctionStore <: SymStore
    name::VarRef
    methods::Vector{MethodStore}
    doc::String
    extends::VarRef
+   extendedby::Vector{VarRef}
    exported::Bool
end

but populate it at the beginning when you parse the environment as a whole.

ZacLN commented 4 years ago

Yep, that exactly what we do! https://github.com/julia-vscode/SymbolServer.jl/blob/master/src/symbols.jl#L393-L408

timholy commented 4 years ago

Good. I was surprised though by the F12 behavior above. Why doesn't it give me the option to see A's method?

ZacLN commented 4 years ago

The bug as above - isdefined(B,:foo) == false so we don't lookup the method I think

timholy commented 4 years ago

Not in the version that does import A: foo:

julia> env[:B][:foo]
SymbolServer.FunctionStore(SymbolServer.VarRef(SymbolServer.VarRef(nothing, :B), :foo), SymbolServer.MethodStore[SymbolServer.MethodStore(:foo, :B, "/tmp/pkgs/dev/B/src/B.jl", 6, Pair{Any,Any}[Symbol("#unused#") => SymbolServer.FakeTypeName(SymbolServer.VarRef(SymbolServer.VarRef(nothing, :B), :Foo), Any[])], Symbol[], SymbolServer.FakeTypeName(SymbolServer.VarRef(SymbolServer.VarRef(nothing, :Core), :Any), Any[]))], "", SymbolServer.VarRef(SymbolServer.VarRef(nothing, :A), :foo), false)

Also:

julia> exts = SymbolServer.collect_extended_methods(env);

julia> exts[SymbolServer.VarRef(SymbolServer.VarRef(A), :foo)]
2-element Array{SymbolServer.VarRef,1}:
 SymbolServer.VarRef(nothing, :A)
 SymbolServer.VarRef(nothing, :B)

davidanthoff commented 3 years ago

I've for now classified this as a bug, because it seems there is still something we need to fix? But I read this issue quickly, just trying to organize our issues a bit more :) So if someone more involved in this issue has better update on the status of this, please let us know!

pfitzseb commented 3 years ago

I feel like we had some recent improvements to cases like this, but should probably try the examples here again.

julia-vscode / SymbolServer.jl

Methods of a single function are missing or scattered across modules #161

Issue demonstration

Proposed fix