mimesis-inria / caribou

Multi-physics computation library
GNU Lesser General Public License v3.0
29 stars 17 forks source link

Optimized addForce #101

Closed AlbanOdot closed 2 years ago

AlbanOdot commented 2 years ago

Didn't find any pr template so I'm going freestyle.

addForce would compute elementwise nodal forces by adding them in a local structure nicely named "nodal_forces". Once this is over it would iterate over this nodal_forces structure to write in the actual sofa VecDeriv. Thus iterating twice over each element contributions and also ignoring the mapped forces eigen vector. l113 Eigen::Map<Eigen::Matrix<Real, Eigen::Dynamic, Dimension, Eigen::RowMajor>> forces (&(sofa_f[0][0]), nb_nodes, Dimension);

This pr removes the second loop by directly substracting the force participation of each element and uses the mapped forces vector.

While we could use the sofa WriteAccessor, the mapped force vector was already here and will be useful for the incoming pr.

Alban Odot

jnbrunet commented 2 years ago

Did you see a large difference on computational time here? We had to do it this way since it was accumulating too much coefficients with non-linear elements (sparse matrix accumulate coefficients instead of summing them), which resulted in gigabytes of RAM taken with large meshes.

If the computational optimization is large, I can agree with this PR, but otherwise I think we should find a better way to handle this...

AlbanOdot commented 2 years ago

Hi ! I tested the code on a 81,000 dofs falling cube. I didn't noticed much, if any, gain. I'll try to improve the code in another way. You can reject the pr 👍🏻

AlbanOdot commented 2 years ago

Maybe we could have a discussion on this part of the code, see if we can improve the python interface also !

jnbrunet commented 2 years ago

Hi ! I tested the code on a 81,000 dofs falling cube. I didn't noticed much, if any, gain. I'll try to improve the code in another way. You can reject the pr 👍🏻

Thanks a lot for testing this !

Since most of these loops are known at compile time (the number of nodes and integration points per elements are static), it is possible that the compiler optimized it during the compilation. It could be interesting to redo the test with quadratic (or cubic, they should be available soon) hexahedrons since they have more integration points.

Maybe we could have a discussion on this part of the code, see if we can improve the python interface also !

Absolutely! Optimizing this part of the code is always a great challenge, and the rewards of seeing better performance is priceless.