grasph / AntiKt.jl

3 stars 3 forks source link

Make AntiKt.PseudoJet and HistoryElement immutable #1

Closed oschulz closed 1 year ago

oschulz commented 1 year ago

This turns AntiKt.PseudoJet and HistoryElement immutable. It doesn't speed up the algorithm itself, but reduces memory allocation significantly:

Before:

julia> @benchmark antikt.(events, R, ptmin)
BenchmarkTools.Trial: 160 samples with 1 evaluation.
 Range (min … max):  29.220 ms … 35.208 ms  ┊ GC (min … max): 0.00% … 11.49%
 Time  (median):     30.464 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   31.331 ms ±  1.620 ms  ┊ GC (mean ± σ):  4.11% ±  4.71%

After:

BenchmarkTools.Trial: 159 samples with 1 evaluation.
 Range (min … max):  30.142 ms … 37.150 ms  ┊ GC (min … max): 0.00% … 3.74%
 Time  (median):     31.688 ms              ┊ GC (median):    3.63%
 Time  (mean ± σ):   31.560 ms ±  1.001 ms  ┊ GC (mean ± σ):  2.02% ± 1.96%

The memory allocation cost of mutable structs is higher in Julia than in C++: An array of fixed-sized mutable structs is only a single memory allocation in C++, but in Julia, every element of the array must be allocated on the heap separately (this is true for pretty much any language with automatic garbage collection). "Small" structs should be immutable in Julia.

This becomes especially relevant when running on many threads, since threads can't do heap-(de-)allocation in parallel (both in Julia and C++, also Julia doesn't to parallel GC yet), so the alloc/gc-percentage goes up as the number of threads increases, until alloc/GC-cost dominates.

The remaining alloc/GC cost in antikt is mainly due to the mutable TiledJet. Using indices instead of pointers for TiledJet-references would make it possible to make TiledJet immutable as well, reducing GC cost to almost zero. ArraysOfArrays.VectorOfArrays could be used to reduce memory allocations for the vector-of-vectors-of-jets that antikt returns.

In general, using graph data structures based on adjacency lists (e.g. via Graphs.jl) would probably be better than using pointer/reference-based graph structures.

grasph commented 1 year ago

Hello @oschulz. By making PseudoJet immutable you have dropped the on-demand calculation of rapidity and phi. It results in a 10% increase of runtime.

Is your point of individual field allocation also true for struct containig only PODs?

The allocation numbers I get for one run on the 100 events are:

Without this PR:

With the PR:

Keeping only changed on HistoryElement:

So making PseudoJet immutable, reduces the number of allocations, although it increases the amount of allocated memory.

I've tried with mutable fields (using 0-dim array) for eta and phi and the results are worse than in all other configurations.

Philippe.

oschulz commented 1 year ago

By making PseudoJet immutable you have dropped the on-demand calculation of rapidity and phi. It results in a 10% increase of runtime.

Yes, I did, though in my benchmarks I didn't see a significant performance penalty due to that. But then, effects on the order of 10% are tricky to benchmark because they can depend on the data size, caching effects, multi- vs. single-threaded execution, etc.

The rapidity and phi calculation looked fairly cheap, so I though it might be easiest to not cache them. In principle we could implement a caching for them while keeping PseudoJet immutable, I think, but it may not be worth it (would require more in-depth benchmarking) - it could even lead to worse performance in some scenarios due to increased branch-prediction misses in the CPU. It will also depend on how often these values are requested for the same jet, and for what percentage of jets they don't need to be calculated at all, of course.

Is your point of individual field allocation also true for struct containig only PODs

Julia (pretty much any language with automatic GC) needs to allocate any mutable struct on the heap. Immutable structs that have mutable structs as field values can actually be stack allocated now (since v1.6, I think). The reason is that mutable structs are "unique individuals" - if you have a vector of them, some part of the code may hold a reference to one of the elements. So the vector can't be allocated/deallocated as a whole, each element needs to be allocated and GCed individually, in addition to the vector itself (which doesn't contain the mutable PODs, only pointers to them).

Struct fields have to be heap allocated if they are mutable or not concretely typed (e.g. a field of type AbstractFloat), whereas fields of and immutable concrete data type are stack-allocated, "embedded" in the object that contains them.