Closed oschulz closed 1 year ago
Hello @oschulz. By making PseudoJet immutable you have dropped the on-demand calculation of rapidity and phi. It results in a 10% increase of runtime.
Is your point of individual field allocation also true for struct containig only PODs?
The allocation numbers I get for one run on the 100 events are:
Without this PR:
With the PR:
Keeping only changed on HistoryElement:
So making PseudoJet immutable, reduces the number of allocations, although it increases the amount of allocated memory.
I've tried with mutable fields (using 0-dim array) for eta and phi and the results are worse than in all other configurations.
Philippe.
By making PseudoJet immutable you have dropped the on-demand calculation of rapidity and phi. It results in a 10% increase of runtime.
Yes, I did, though in my benchmarks I didn't see a significant performance penalty due to that. But then, effects on the order of 10% are tricky to benchmark because they can depend on the data size, caching effects, multi- vs. single-threaded execution, etc.
The rapidity and phi calculation looked fairly cheap, so I though it might be easiest to not cache them. In principle we could implement a caching for them while keeping PseudoJet
immutable, I think, but it may not be worth it (would require more in-depth benchmarking) - it could even lead to worse performance in some scenarios due to increased branch-prediction misses in the CPU. It will also depend on how often these values are requested for the same jet, and for what percentage of jets they don't need to be calculated at all, of course.
Is your point of individual field allocation also true for struct containig only PODs
Julia (pretty much any language with automatic GC) needs to allocate any mutable struct on the heap. Immutable structs that have mutable structs as field values can actually be stack allocated now (since v1.6, I think). The reason is that mutable structs are "unique individuals" - if you have a vector of them, some part of the code may hold a reference to one of the elements. So the vector can't be allocated/deallocated as a whole, each element needs to be allocated and GCed individually, in addition to the vector itself (which doesn't contain the mutable PODs, only pointers to them).
Struct fields have to be heap allocated if they are mutable or not concretely typed (e.g. a field of type AbstractFloat
), whereas fields of and immutable concrete data type are stack-allocated, "embedded" in the object that contains them.
This turns
AntiKt.PseudoJet
andHistoryElement
immutable. It doesn't speed up the algorithm itself, but reduces memory allocation significantly:Before:
After:
The memory allocation cost of mutable structs is higher in Julia than in C++: An array of fixed-sized mutable structs is only a single memory allocation in C++, but in Julia, every element of the array must be allocated on the heap separately (this is true for pretty much any language with automatic garbage collection). "Small" structs should be immutable in Julia.
This becomes especially relevant when running on many threads, since threads can't do heap-(de-)allocation in parallel (both in Julia and C++, also Julia doesn't to parallel GC yet), so the alloc/gc-percentage goes up as the number of threads increases, until alloc/GC-cost dominates.
The remaining alloc/GC cost in
antikt
is mainly due to the mutableTiledJet
. Using indices instead of pointers forTiledJet
-references would make it possible to makeTiledJet
immutable as well, reducing GC cost to almost zero.ArraysOfArrays.VectorOfArrays
could be used to reduce memory allocations for the vector-of-vectors-of-jets thatantikt
returns.In general, using graph data structures based on adjacency lists (e.g. via Graphs.jl) would probably be better than using pointer/reference-based graph structures.