While we have optimized particle functions such as accelerate, update and distribute, they are all stuck with the same datastructure (Population), which I believe is quite suboptimal. My guesstimate is that a 5-10 times performance boost on the particle part may be obtained by rewriting this from scratch.
The main performance loss, I believe, is from excessive pointer dereferencing. An illustration of the importance of using the right data structures was observed in the performance when we removed one layer of dereferencing by making Particle a memory-contiguous structure instead of a class with vectors. Likewise, Population has many small Cell objectss with vector's of particles and vector's of basis functions, etc. This is not very lightweight for such a performance-critical part of the code. If we could reduce all variables in Population to have only one level of dereferencing I believe it would run much faster. We could for instance have one long vector containing all particles, and mark the particles with the cell they belong to. We could store all cell-related values in one long vector per quantity to store. For example, rather than having a vector of Cells with a vector of vertex_coordinates, we just have a vector of vertex_coordinates directly in Population. For 3D, the vertex coordinates of Cell c is vertex_coordinates[4*c+i] where i is 0, 1, 2, and 3. I.e. contiguously stored instead of many dereferences. Inlined accessor functions can be put in Population declaration in .h file to hide the arithmetics involved to find the right index as this will probably be entirely optimized away by the compiler. On could then write pop.vertex_coordinates(c), etc.
While we have optimized particle functions such as
accelerate
,update
anddistribute
, they are all stuck with the same datastructure (Population
), which I believe is quite suboptimal. My guesstimate is that a 5-10 times performance boost on the particle part may be obtained by rewriting this from scratch.The main performance loss, I believe, is from excessive pointer dereferencing. An illustration of the importance of using the right data structures was observed in the performance when we removed one layer of dereferencing by making
Particle
a memory-contiguous structure instead of a class with vectors. Likewise,Population
has many smallCell
objectss withvector
's of particles andvector
's of basis functions, etc. This is not very lightweight for such a performance-critical part of the code. If we could reduce all variables inPopulation
to have only one level of dereferencing I believe it would run much faster. We could for instance have one longvector
containing all particles, and mark the particles with the cell they belong to. We could store all cell-related values in one long vector per quantity to store. For example, rather than having avector
ofCell
s with avector
ofvertex_coordinates
, we just have avector
ofvertex_coordinates
directly inPopulation
. For 3D, the vertex coordinates ofCell
c isvertex_coordinates[4*c+i]
where i is 0, 1, 2, and 3. I.e. contiguously stored instead of many dereferences. Inlined accessor functions can be put inPopulation
declaration in.h
file to hide the arithmetics involved to find the right index as this will probably be entirely optimized away by the compiler. On could then writepop.vertex_coordinates(c)
, etc.