llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.13k stars 12.02k forks source link

ObjectLinkingLayer::findSymbol is O(n) #27562

Closed yuyichao closed 5 years ago

yuyichao commented 8 years ago
Bugzilla Link 27188
Resolution FIXED
Resolved on May 21, 2019 14:30
Version trunk
OS Linux
CC @lhames,@weliveindetail

Extended Description

And therefore emitting n modules to run is O(n^2).

Ref https://github.com/JuliaLang/julia/issues/15619 also https://github.com/JuliaLang/julia/pull/15749#issuecomment-204827156

lhames commented 5 years ago

Hi Stefan,

Yep -- This is fixed as of ORCv2.

Before closing this ticket, one interesting piece of information would be how Orc v2 handles this. With m = #modules and n = #symbols (each), is the asymptotic complexity of symbol lookup still O(m*n)?

No. JITDylibs (new in ORCv2) contain hash maps of symbol names to addresses/compiler-closures. Lookup for a single symbol is now O(d), where d = #JITDylibs.

weliveindetail commented 5 years ago

This affects Orc v1 and it's unlikely to change. It may even be considered "expected behavior" right?

Before closing this ticket, one interesting piece of information would be how Orc v2 handles this. With m = #modules and n = #symbols (each), is the asymptotic complexity of symbol lookup still O(m*n)?

yuyichao commented 8 years ago

The solution is to cache symbol lookups in your JIT class. This can be done by holding a StringTable member in your class and searching this first, before falling back to lookups in the JIT.

Yes, we are doing that to work around this problem. Document the behavior and make it easier to do this correctly would be better. (Especially since AFAICT the tutorial example uses a similar naive approach).

lhames commented 8 years ago

Hi Yichao,

This is expected at the moment, but needs to be better documented, and there is room to improve infrastructure support for fast symbol lookups.

The reason that this is currently O(n) (and that we walk the objects from first added to last, rather than the other way around) is to match the usual lookup order of static linkers. The reason we don't pre-construct the symbol table by taking the first definition of a symbol as it's added is that definitions are finalized lazily the first time their address is requested (via getAddress), and we don't want to pre-build all those finalizers when most will end up being redundant.

The solution is to cache symbol lookups in your JIT class. This can be done by holding a StringTable member in your class and searching this first, before falling back to lookups in the JIT.

We can improve this further by adding a getSymbolTable method to the ObjectLinkingLayer: this would allow the entire symbol table for an object file to be added to the cached table when the module is emitted.

Cheers, Lang.

yuyichao commented 8 years ago

FWIW, it seems that inserting to the beginning instead of the end should be better? Assuming the modules are usually used in a similar order they are emitted?