Closed jarllarsson closed 9 years ago
Seems to be solved now. The biggest offenders seemed to be the excessive memory allocations done for the jacobian calculations. Especially for GCVF forces which are calculated several times, for each sub chain. Now the largest possible jacobian size is calculated when building the characters, so that only a single jacobian matrix is needed per character calculation. This can of course be optimized further, with more pre-steps. Another issue was that of the CMatrix in which the Jacobian matrix was stored. Having it utilise dynamic 2D arrays was super time consuming. Changing it to a 1D array (with proper 2D-to-1D indexing) reduced the execution time by a lot. I did some changes to account for false sharing, such as copying the chunk of torques into local memory of the thread to avoid stalling on cache lines. But it seems like it only made each scenario slower, so I'll try to revert it and see if I get a performance boost.
Issues I thought about: False Sharing Memory Inefficiencies