jarllarsson / promenade

Code repository for Master's Thesis: "Performance of Physics-Driven Procedural Animation of Character Locomotion For Bipedal and Quadrupedal Gait"
Other
24 stars 4 forks source link

parallel code is slower than serial #23

Closed jarllarsson closed 9 years ago

jarllarsson commented 9 years ago

Issues I thought about: False Sharing Memory Inefficiencies

jarllarsson commented 9 years ago

Seems to be solved now. The biggest offenders seemed to be the excessive memory allocations done for the jacobian calculations. Especially for GCVF forces which are calculated several times, for each sub chain. Now the largest possible jacobian size is calculated when building the characters, so that only a single jacobian matrix is needed per character calculation. This can of course be optimized further, with more pre-steps. Another issue was that of the CMatrix in which the Jacobian matrix was stored. Having it utilise dynamic 2D arrays was super time consuming. Changing it to a 1D array (with proper 2D-to-1D indexing) reduced the execution time by a lot. I did some changes to account for false sharing, such as copying the chunk of torques into local memory of the thread to avoid stalling on cache lines. But it seems like it only made each scenario slower, so I'll try to revert it and see if I get a performance boost.