ros / geometry2

A set of ROS packages for keeping track of coordinate transforms.
190 stars 275 forks source link

Use snprintf instead of stringstream to increase performance of lookupTransform() in error cases. #470

Closed peci1 closed 4 years ago

peci1 commented 4 years ago

For some reason, the stringstream construction of tf2 exception messages is much slower than snprintf().

In some cases, there may be lots of these messages generated, thus I think it makes sense to save time here.

A simple substitution of the stringstream approach with snprintf brings a two- to three-fold speedup for the error cases.

Speed test with the additions from #469 (the updated speed_test is not included in this PR, but I used it to measure the performance gain):

noetic-devel:

Info:    9 to 4
Info:    Doing 1000000 10-level 1.000000-interval tests
Info:    lookupTransform at Time(0) took 1.588220 for an average of 0.000001588
Info:    lookupTransform at Time(1) took 1.118618 for an average of 0.000001119
Info:    lookupTransform at Time(1.5) took 2.125916 for an average of 0.000002126
Info:    lookupTransform at Time(2) took 1.009223 for an average of 0.000001009
Info:    canTransform at Time(0) took 1.105921 for an average of 0.000001106
Info:    canTransform at Time(1) took 0.658616 for an average of 0.000000659
Info:    canTransform at Time(1.5) took 0.879653 for an average of 0.000000880
Info:    canTransform at Time(2) took 0.598742 for an average of 0.000000599
Info:    canTransform at Time(3) without error string took 1.912571 for an average of 0.000001913
Info:    canTransform at Time(3) with error string took 4.456371 for an average of 0.000004456

This patch:

Info:    9 to 4
Info:    Doing 1000000 10-level 1.000000-interval tests
Info:    lookupTransform at Time(0) took 1.548510 for an average of 0.000001549
Info:    lookupTransform at Time(1) took 1.044787 for an average of 0.000001045
Info:    lookupTransform at Time(1.5) took 2.035262 for an average of 0.000002035
Info:    lookupTransform at Time(2) took 0.976278 for an average of 0.000000976
Info:    canTransform at Time(0) took 1.073062 for an average of 0.000001073
Info:    canTransform at Time(1) took 0.632388 for an average of 0.000000632
Info:    canTransform at Time(1.5) took 0.828700 for an average of 0.000000829
Info:    canTransform at Time(2) took 0.603433 for an average of 0.000000603
Info:    canTransform at Time(3) without error string took 0.855472 for an average of 0.000000855
Info:    canTransform at Time(3) with error string took 1.685970 for an average of 0.000001686

There's one change that might be reverted if needed - the removal if <sstream> include from time_cache.h. It was just no longer needed for tf2, so I removed it, but if you think it'd be an API break, it can be put back.

There should also be discussion on the string length limit in buffer_core. I chose 1000. Whatever else can be used. I also tried a solution with dynamic allocation of memory for the string, but it looked ugly (but the performance was nearly the same).

peci1 commented 4 years ago

Here's my rationale from a node that processes lots of laserscans which are quite fresh so they often need some time before their transform gets into the cache.

Snímek obrazovky pořízený 2020-06-18 18-14-05

This is a flamegraph where I circled all calls to createExtrapolationException*. Roughly estimated, these calls take about 20% run time of my code. With this PR, it could be less than 10%.

Merging #469 could help a bit with this as most of the internal canTransform() calls will no longer create the strings, but it still seems to me that people would profit from this further optimization.

tfoote commented 4 years ago

Manually rebased and merged in 52bff88ecad8821802ad5e04dcb711b582c39f5a