laas / metapod

A template-based robot dynamics library
GNU Lesser General Public License v3.0
14 stars 10 forks source link

Benchmark Woes #74

Closed martinfelis closed 10 years ago

martinfelis commented 10 years ago

Bonjour et bonne année!

I have been playing around with the benchmark and possibly ran into a bug. The benchmark runs two loops: the outer loop generates random robot configuration variables and the inner loop then calls the function that is being benchmarked. By default there are 100 outer and 1000 inner loops.

When changing to 1 outer and 100000 inner loops I get very different results for the individual function call durations even though the overall runtime is similar.

I ran 'time benchmark' on both variants. Here is the output of the last benchmark model and the time command:

Default benchmark (outer 100, inner 1000):

*************
Model NBDOF : 31
  average execution time :
jcalc: 1.82543µs
bcalc: 2.40218µs
rnea: 7.12526µs
rnea (without jcalc): 5.21813µs
crba: 7.08987µs
crba (without jcalc): 5.18196µs
jac (without jcalc): 4.82069µs
jac_point_robot (without bcalc): 7.57695µs
./benchmark/benchmark  11.62s user 0.00s system 99% cpu 11.635 total

Modified benchmark (outer 1, inner 100000)

*************
Model NBDOF : 31
  average execution time :
jcalc: 1.81856µs
bcalc: 4.20191µs
rnea: 11.1085µs
rnea (without jcalc): 15.9698µs
crba: 22.8785µs
crba (without jcalc): 27.8271µs
jac (without jcalc): 32.2924µs
jac_point_robot (without bcalc): 39.6024µs
./benchmark/benchmark  11.50s user 0.00s system 99% cpu 11.505 total

Any ideas?

Regards, Martin

sbarthelemy commented 10 years ago

Hello Martin,

bonne année également !

I cannot reproduce the issue here (see numbers below), however, I do not use laas/metapod/master but a slightly modified version. I have no time to try to reproduce the issue on laas/metapod/master right now, but will try to do so by the end of the week.

Two differences between my version and laas/metapod/master may enter into play:

Note also that with a single outer loop, maybe you get a very unfavourable set of inputs? Eventually, I already had strange discrepancies when commenting some models and algorithms out of benchmarks. My conclusion was that the compiler was making different inlining choices, some better than others.

Cheers

outer 100, inner 1000

$ ./build-linux64-release/sdk/bin/benchmark_sample_4 
Model NBDOF : 31
  average execution time :
jcalc: 1.61874µs
bcalc: 2.00013µs
rnea: 5.29213µs
rnea (without jcalc): 3.53799µs
crba (without jcalc): 3.53386µs
jac (without jcalc): 2.02279µs
jac_point_robot (without bcalc): 5.23169µs

outer: 1 inner: 100 000

$ ./build-linux64-release/sdk/bin/benchmark_sample_4 
Model NBDOF : 31
  average execution time :
jcalc: 1.64495µs
bcalc: 1.96543µs
rnea: 5.31152µs
rnea (without jcalc): 3.47231µs
crba (without jcalc): 3.56129µs
jac (without jcalc): 2.02315µs
jac_point_robot (without bcalc): 5.08292µs
olivier-stasse commented 10 years ago

Hi Martin, Sebastien, I checked on a Linux 10.04 LTS (32 bits) with gcc 4.4.3

outer 100 inner model : 1000

Model NBDOF : 31
  average execution time :
jcalc: 2.06439µs
bcalc: 2.87223µs
rnea: 7.59191µs
rnea (without jcalc): 5.50481µs
crba: 6.57188µs
crba (without jcalc): 4.37946µs
jac (without jcalc): 3.14048µs
jac_point_robot (without bcalc): 8.69764µs

outer 1: inner : 100000

Model NBDOF : 31
  average execution time :
jcalc: 2.10482µs
bcalc: 5.01339µs
rnea: 12.7801µs
rnea (without jcalc): 18.2008µs
crba: 24.8866µs
crba (without jcalc): 29.3114µs
jac (without jcalc): 32.3227µs
jac_point_robot (without bcalc): 40.7903µs

@martinfelis Could you specify your gcc version and your distro ?

I tried with gcc-4.7.2 and got the following results:

Model NBDOF : 31 average execution time : jcalc: 184.174µs bcalc: 520.746µs rnea: 1772.65µs rnea (without jcalc): 2835.3µs crba: 3960.71µs crba (without jcalc): 4894.49µs jac (without jcalc): 5281.35µs jac_point_robot (without bcalc): 6659.37µs

There is definitely something fishy here...

martinfelis commented 10 years ago

It seems to be a bug in the non-boost timer. After installing the boost timer module (1.48) I had to change https://github.com/laas/metapod/blob/master/timer/src/timer.cc#L34 from #if defined(WITH_BOOST_TIMER) && BOOST_VERSION <= 104800 to #if defined(WITH_BOOST_TIMER) && BOOST_VERSION < 104800 to convince the timer to actually use the boost timer.

With that change both benchmark results are similar.

@sbarthelemy the inputs should not change the runtime (unless you use uninitialized Eigen values). But yeah, it still could happen that different inlining is done for different models.

sbarthelemy commented 10 years ago

It seems to be a bug in the non-boost timer.

@olivier-stasse, do you need to support boost <1.48? I would be favourable to remove all those fall-back.

After installing the boost timer module (1.48) I had to change https://github.com/laas/metapod/blob/master/timer/src/timer.cc#L34

Good catch, thank you

olivier-stasse commented 10 years ago

Thanks for the catch. Unfortunately yes I need the boost < 1.48 support for my robot...