sourcegraph / scip-clang

Apache License 2.0
53 stars 7 forks source link

Parallelize index merging #139

Open varungandhi-src opened 1 year ago

varungandhi-src commented 1 year ago

Preliminary numbers (based on indexing LLVM):

Build Forward decls TU count Indexing time (Ti) (s) Merging time (Tm) (s) Ti/Tm
Dev No 100 16.1 0.6 26.8
Dev No 1000 363.9 12.9 28.2
Dev No 4564 1503.8 39.9 37.7
Opt No 4564 270.8 9.3 29.1
Opt WIP 4564 306.6 90.9 3.4
Opt No 6371 1081.9 17.7 61
Opt Final 6371 1137.1 105.9 10.7

(The 100/1000 scaling figure should not be taken literally due to varying amounts of code across TUs.)

This is for a 10 core machine (dev MacBook Pro). The ratio will go down for higher core count machines.

We should be able to parallelize merging based on documents.

For communicating between threads, we could probably use https://github.com/rigtorp/SPSCQueue

varungandhi-src commented 1 year ago

Writing down some design notes here, because I think we should do this, but there are non-trivial data dependencies.

(Nw = number of workers)

One potential design which avoids fine-grained synchronization:

  1. Loop over main indexes, work-stealing parallel-map over documents, accumulate pair<SymbolName, PointerUnion<SymbolInformation *, SymbolInformationBuilder *>> vectors in a Nw x Nw matrix, rows for workers, columns based on hash(SymbolName) % Nw.
  2. Parallel-map over matrix columns, accumulating into an NWorker-length vector of (SymbolName -> PointerUnion<SymbolInformation *, SymbolInformationBuilder *>) maps based on hash(SymbolName) % N.
  3. Work-stealing parallel-map over forward decl indexes, accumulate pair<SymbolName, SymbolInformation *> vectors in an Nw x Nw matrix similar to before.
  4. Parallel-map over forward decl matrix columns. Each worker only performs lookups and modifies pointers based on the matching index in the vector created in step 2. Since the modified pointers are disjoint, there is no need for synchronization.

We could perhaps have some abstraction similar to the WorkerPool in Sorbet, utilizing an SPMC/MPMC queue for implementing work-stealing.

Potential tweaks to above: