Use of SOFA_NO_VMULTIOP

hugtalbot commented 4 years ago

The option SOFA_NO_VMULTIOP in ODESolver (aka integration schemes) makes the reading (i.e. the understanding for new users) and the dev cumbersome.

I know the VMULTIOP allows for optimized vector operations with the MultiOp. Would there now be a reason to just keep one upon the other ? Any other thought?

Suggested labels:

jnbrunet commented 4 years ago

I think the CGLinearSolver has a good example on such "optimization". Let's look at the following un-optimized code:

Version 1

x.peq(p,alpha);                 // x = x + alpha p
r.peq(q,-alpha);                // r = r - alpha q

which can be seen as:

VOp op1 = "x = x + alpha p";
Visitor(op1);

VOp op2 = "r = r - alpha q";
Visitor(op2);

and its optimized version:

Version 2

VMultiOp ops;
ops.resize(2);
ops[0].first = (MultiVecDerivId)x;
ops[0].second.push_back(std::make_pair((MultiVecDerivId)x,1.0));
ops[0].second.push_back(std::make_pair((MultiVecDerivId)p,alpha));
ops[1].first = (MultiVecDerivId)r;
ops[1].second.push_back(std::make_pair((MultiVecDerivId)r,1.0));
ops[1].second.push_back(std::make_pair((MultiVecDerivId)q,-alpha));
this->executeVisitor(simulation::MechanicalVMultiOpVisitor(params, ops));

which can be seen as:

VMultiOp ops;
ops[0] = "x = x + alpha p"
ops[1] = "r = r - alpha q"
Visitor(ops);

Finally, imagine we have the following scenario:

                CGLinearSolver
                       |
         +-------------+--------------
         |                           |
+--------+-----------+    +----------+---------+
|         MO1        |    |         MO2        |
+--------------------+    +--------------------+
|x: [x0, x1, ..., xn]|    |x: [x0, x1, ..., xm]|
|p: [p0, p1, ..., pn]|    |p: [p0, p1, ..., pm]|
|r: [r0, r1, ..., rn]|    |r: [r0, r1, ..., rm]|
|q: [q0, q1, ..., qn]|    |q: [q0, q1, ..., qm]|
+--------------------+    +----------+---------+
                                     |
                          mapping1   |   mapping2
                      +--------------+-------------+
                      |                            |
           +----------+---------+       +----------+---------+
           |         MO3        |       |         MO4        |
           +--------------------+       +--------------------+
           |x: [x0, x1, ..., xl]|       |x: [x0, x1, ..., xo]|
           |p: [p0, p1, ..., pl]|       |p: [p0, p1, ..., po]|
           |r: [r0, r1, ..., rl]|       |r: [r0, r1, ..., ro]|
           |q: [q0, q1, ..., ql]|       |q: [q0, q1, ..., qo]|
           +--------------------+       +--------------------+

Version 1 will do:

Launch a visitor in the CG's subgraph (x.peq(p,alpha);) MO 1 : x = x + alpha p M0 2 : x = x + alpha p MO 3 : x = x + alpha p MO 4 : x = x + alpha p
Launch a visitor in the CG's subgraph (r.peq(q,-alpha);) MO 1 : r = r - alpha q M0 2 : r = r - alpha q MO 3 : r = r - alpha q MO 4 : r = r - alpha q

Version 2 will do:

Launch a visitor in the CG's subgraph MO 1 : x = x + alpha p r = r - alpha q MO 2 : x = x + alpha p r = r - alpha q MO 3 : x = x + alpha p r = r - alpha q MO 4 : x = x + alpha p r = r - alpha q

Unless I'm missing something, the optimized version only removes one subgraph visit and allows the compilator to *maybe* optimize two subsequent operations on vectors. However, I don't think the compiler will actually do something here as the MechanicalState::vMultiOp simply makes two calls to MechanicalState::vOp which, well, is a virtual function and can't be inlined.

To the question to whether or not we should keep one or the other, I guess we would have to benchmark it (use a scene with multiple top level mechanical object's, and multiple mapping levels). If the optimized version do not brings a lot of speed (which would be my guess since the cost of a visit isn't very big, but I may be wrong), I would remove the optimization as it is quite confusing for the reader.

damienmarchal commented 4 years ago

Hi Hugo,

Thanks for the question and @jnbrunet for the very interesting answer.

To me #define activated feature must always be removed in any case. So the question should be, do we transform that into a runtime feature that is activated by a data or to directly remove it. If version "optimized" is always faster we could probably keep it...but refactoring the API using c++11&c++17 so it look shorter and nicer.

I just notice this ugly #ifdef DISPLAY_TIME in CGLinearSolver......cleaning... cleaning...

hugtalbot commented 4 years ago

Thanks @jnbrunet for the interesting feedback indeed. I can run the tests!

Agreed @damienmarchal What is this DISPLAY_TIME!?! wtf!

damienmarchal commented 4 years ago

DISPLAY_TIME ...well it should follow the same trajectory that SOFA_NO_VMULTIOP...be replaced by the AdvancedTimer or removed.

hugtalbot commented 4 years ago

See #1267 :)

sofa-framework / sofa

Use of SOFA_NO_VMULTIOP #1263