Add coriolis matrix computation

haudren commented 6 years ago

This PR adds the work I did while I was still at the lab: it allows to compute the full Coriolis matrix (i.e. C(q, qd) instead of C(q, qd)qd).

I know it is not perfect, but I would like to put it out there !

There might also be some overlap with #46 as I had to implement a "block expand" functionality to obtain decent performance.

Things to fix:

[x] Indentation should use tabs
[x] Tests should be a real test, not just a print
[x] Windows build is probably broken

Maybes (what's your opinion @gergondet ):

[x] Make a Coriolis benchmark
[x] Split the Jacobian expand logic from the Coriolis side
[x] Add python bindings

haudren commented 6 years ago

I used the following clang-format config to fix my indentation, might be useful to put it somewhere in the project.

BasedOnStyle: LLVM
IndentWidth: 8
UseTab: Always
BreakBeforeBraces: Allman
AllowShortIfStatementsOnASingleLine: false
IndentCaseLabels: false

haudren commented 6 years ago

@gergondet : Any remarks ?

gergondet commented 6 years ago

Sorry for the long delay, I was on vacation but I am back to work now!

Make a Coriolis benchmark

Sure.

Split the Jacobian expand logic from the Coriolis side

Sure. That could be use as in #46

Few things that I didn't add to the review:

This could be part of the Jacobian interface
We could store the compact path in the Jacobian
I would use a light structure rather than a raw array for Block, this make compactExpandAdd hard to decipher without looking at how the compact path is computed

Add python bindings

Will do when the C++ version is finalized.

haudren commented 6 years ago

@gergondet : I think I fixed a lot of the mentioned points, but I'm out of time for today, I'll come back to it tomorrow to fix the rest. Feel free to modify the branch as you see fit of course.

haudren commented 6 years ago

Note that the build is failing because it can't find boost on Windows apparently? Weird.

haudren commented 6 years ago

I tried implementing your suggestion of using only one res and passing blocks of it:

Version	Debug (100 rounds)	Release (1000 rounds)
One allocation	7.75s	0.13s
Multiple allocs	7.94s	0.15s

~~So it is slower to have a single allocation in Release apparently, the compiler must be doing some magic behind the scenes.~~

It is marginally faster to have a single allocation in Release.

here is the diff:

diff --git a/src/Coriolis.cpp b/src/Coriolis.cpp
index 8c2243f..1bdbcc8 100644
--- a/src/Coriolis.cpp
+++ b/src/Coriolis.cpp
@@ -62,8 +62,8 @@ Blocks compactPath(const rbd::Jacobian& jac,
 }

 void expandAdd(const Blocks& compactPath,
-           const Eigen::MatrixXd& jacMat,
-           Eigen::MatrixXd& res)
+       const Eigen::Ref<const Eigen::MatrixXd>& jacMat,
+       Eigen::MatrixXd& res)
 {
    for(const auto& b1 : compactPath)
    {
@@ -105,6 +105,14 @@ const Eigen::MatrixXd& Coriolis::coriolis(const rbd::MultiBody& mb, const rbd::M

         coriolis_.setZero();

+        int res_size = 0;
+        for(const auto& jac : jacs_)
+        {
+          res_size = std::max(res_size, jac.dof());
+        }
+
+        Eigen::MatrixXd res = Eigen::MatrixXd::Zero(res_size, res_size);
+
    for(int i = 0; i < mb.nrBodies(); ++i)
    {
        const auto & jac = jacs_[i].jacobian(mb, mbc);
@@ -129,10 +137,10 @@ const Eigen::MatrixXd& Coriolis::coriolis(const rbd::MultiBody& mb, const rbd::M
         *        + J_{w_i}^T R_i I_i R_i^T \dot{J}_{w_i}
         *        + J_{w_i}^T \dot{R}_i I_i R_i^T J_{w_i} */

-       Eigen::MatrixXd res = mass*jvi.transpose()*jDvi
+       res.topLeftCorner(jacs_[i].dof(), jacs_[i].dof()) = mass*jvi.transpose()*jDvi
                + jwi.transpose()*((rot*ir)*jDwi +  (rDot*ir)*jwi);

-       expandAdd(compactPaths_[i], res, coriolis_);
+       expandAdd(compactPaths_[i], res.topLeftCorner(jacs_[i].dof(), jacs_[i].dof()), coriolis_);
    }

    return coriolis_;
diff --git a/src/RBDyn/Coriolis.h b/src/RBDyn/Coriolis.h
index d9512c4..904cd5e 100644
--- a/src/RBDyn/Coriolis.h
+++ b/src/RBDyn/Coriolis.h
@@ -73,7 +73,7 @@ RBDYN_DLLAPI Blocks compactPath(const rbd::Jacobian& jac,
  * @param res The accumulator matrix
  **/
 RBDYN_DLLAPI void expandAdd(const Blocks& compactPath,
-           const Eigen::MatrixXd& jacMat,
+           const Eigen::Ref<const Eigen::MatrixXd>& jacMat,
            Eigen::MatrixXd& res);

 /**
diff --git a/tests/CoriolisTest.cpp b/tests/CoriolisTest.cpp
index 636d605..a1a1cc5 100644
--- a/tests/CoriolisTest.cpp
+++ b/tests/CoriolisTest.cpp
@@ -38,7 +38,7 @@ BOOST_AUTO_TEST_CASE(CoriolisTest)
    rbd::ForwardDynamics fd(mb);
    rbd::Coriolis coriolis(mb);

-   const static int ROUNDS = 100;
+   const static int ROUNDS = 1000;

    for (int i = 0; i < ROUNDS; ++i)
    {

haudren commented 6 years ago

My bad, I forgot to change the number of rounds in the Release/Multiple Allocs case. Updated the table.

gergondet commented 6 years ago

I've adressed the 3 remaining todos and used the Block feature in Momentum, this should now close #46 and #45

Actually, this opens the possibility for a very small optimization in the computation of the momentum dot matrix.

I will try to figure out why the AppVeyor build is failing tomorrow...

haudren commented 6 years ago

@gergondet : Nice work, thanks ! Let me know if you need anything else :+1:

haudren commented 6 years ago

I also made a quick bench for momentum:

Revision	Momentum matrix	Momentum matrix dot
a1e6f0ad (master)	5.6e-5	1.2e-4
56b5449b (PR)	2.7e-5	5.5e-5

So we are on average about twice as fast, not bad :smile:

@gergondet : Should I commit this benchmark in JacobianBench ?

gergondet commented 6 years ago

Ok, CI should be ok now after updating the Boost version.

Will merge when all CI lands.

haudren commented 6 years ago

Thanks :fireworks:

jrl-umi3218 / RBDyn

Add coriolis matrix computation #49