Create function that returns mutiple transforms based on a list of frames

fjandrad commented 4 years ago

In cases where we are updating information outside of iDyntree. Such as the new Matlab visualizer .

The idea for this function is:

Something like

bool getWorldTransforms(const std::vector<std::string>& frames, std::vector<iDynTree::Transform>& output);

cc @S-Dafarra @traversaro

S-Dafarra commented 4 years ago

A first dumb implementation could simply use a for cycle and call the getWorldTransform method for each of the requested frames, checking that they actually exist. Later, we can try to identify if there is some bottleneck.

fjandrad commented 4 years ago

For testing purposes the following funcitons were implemented:

std::vector<iDynTree::Transform> getWorldTransforms( std::vector<int> frameIndexes);   
std::vector<iDynTree::Transform> getWorldTransformsIndex( std::vector<int> frameIndexes);
std::vector<iDynTree::Transform> getWorldTransforms(std::vector<std::string> frameNames);
std::vector<iDynTree::Matrix4x4> getWorldTransformsAsHomogeneous(std::vector<std::string> frameNames);

For testing/benchmarking the following script was used:

testGetWorldTransformTimes.txt

The results are getWorldBenchMarking

The lines with ind as legend means get each transform individually in a for loop. The others are in the same order as the mentioned implemented functions. This suggest that any improvements on the C++ side might have negligible impact on the overall timings in Matlab.

It can be seen that there seems to be no gain in getting all the transforms except for the function that gives directly the homogenous transform. The reason is probably that even if we get a vector of transforms we still need to iterate through the vector and transform each individual transform to matlab matrix. The reason homogeneous transform seem to have the upper hand is because we save the asHomogeneousTransform step in the bindings.

Here we can se why it takes less time to get directly the homogeneous transform: toMatlabCalls

Despite it takes half the time to do just toMatlab, the time decreases by a third since we go through the iterator to ge the individual matrix and then do to Matlab. If we do it in a single line we get a further optimization although small.

furtherImprove

I doubt it is possible, but what could further help for the visualization in terms of time optimization is to be able to get from a single swig function directly matlab matrices without the need to do the toMatlab call.

fjandrad commented 4 years ago

Looking at the code I see there is some logic to get the semantics. If we remove the logic for the semantics that is not being used in the visualization in Matlab we get:

tryNoSemantics

It seems removing the semantics does not give any gain.

fjandrad commented 4 years ago

With the changes and the use of the new function we have that the mean of 100 update times are:

% using iDyntree string vector
stringVector_time =     0.0018
% using a string cell array with the link names
cellArray_time =    0.0275

It is an order of magnitude difference. performanceComparison

Done in 2a846e0738582f9d671e63ccf538a1cf1265e067

fjandrad commented 4 years ago

It seems that CI is failing. The message is:

@traversaro this does not seem related to the changes I did. Could I proceed to open a pull request? Should I open an issue for this error?

S-Dafarra commented 4 years ago

It seems that CI is failing.

See also https://github.com/robotology/idyntree/pull/667#issuecomment-608435760. Probably it is enough for you to rebase on master.

S-Dafarra commented 4 years ago

I would have some curiosities:

What is the difference between std::vector<iDynTree::Transform> getWorldTransforms( std::vector<int> frameIndexes); and std::vector<iDynTree::Transform> getWorldTransformsIndex( std::vector<int> frameIndexes);.
In C++ this implementation may be inefficient because you are returning by copy a vector. If instead you pass the output vector as a parameter, you can resize it once beforehand. Then, multiple calls of this function would avoid further memory allocation, which can be very costly in C++. Is it possible in Matlab to pass the output vector as a parameter or there are difficulties? Would it make any difference on the time?
In case it is possible to pass the output vector as a parameter, to which object would it correspond in the C++ side? Is it possible to write an implementation which can avoid the toMatlab or fromMatlab (if it exists) calls? Maybe using templates?
Is it worth to try a similar test fully in C++ to see if those times are so high also there?
How do you use the output vector in Matlab? It seems that Matlab is pretty bad in for loops and if clauses. Maybe you can move some more logic in the C++ side to reduce the amount of loops done in Matlab.

traversaro commented 4 years ago

It seems that CI is failing. The message is:

@traversaro this does not seem related to the changes I did. Could I proceed to open a pull request? Should I open an issue for this error?

This was fixed by https://github.com/robotology/idyntree/pull/667 , please rebase and it should work fine.

traversaro commented 4 years ago

I doubt it is possible, but what could further help for the visualization in terms of time optimization is to be able to get from a single swig function directly matlab matrices without the need to do the toMatlab call.

It is possible, but you need to inject the C++ code directly in the SWIG generated code, see for example as the toMatlab call itself is implemented: https://github.com/robotology/idyntree/blob/114671fb1a012f6214b35b8462d463fea99781fd/bindings/matlab/matlab_matvec.i#L119 .

fjandrad commented 4 years ago

> What is the difference between std::vector<iDynTree::Transform> getWorldTransforms( std::vector<int> frameIndexes); and std::vector<iDynTree::Transform> getWorldTransformsIndex( std::vector<int> frameIndexes);.

It was a test I did when seeing that the time between using indexes and a frame name was pretty much the same. I had the doubt if maybe because of the overloading of the function something was taking a bit of extra time. So I created one that only takes integers, but the result was the same. It was deleted later since it was purely to test that.

In C++ this implementation may be inefficient because you are returning by copy a vector. If instead you pass the output vector as a parameter, you can resize it once beforehand. Then, multiple calls of this function would avoid further memory allocation, which can be very costly in C++. Is it possible in Matlab to pass the output vector as a parameter or there are difficulties? Would it make any difference on the time?

My understanding is that we can don't know if it would be more efficient. I can give it a try.

In case it is possible to pass the output vector as a parameter, to which object would it correspond in the C++ side? Is it possible to write an implementation which can avoid the toMatlab or fromMatlab (if it exists) calls? Maybe using templates?

Avoiding the toMatlab is something I also want to try, even without sending the output vector. For the C++ side I think is straight forward if I create a iDynTree.MatrixVector variable ( soon to be renamed to Matrix4x4Vector ). Now using both the sending of the output parameter and avoid using the toMatlab conversion is something that I'm not sure out my head how to do.

Is it worth to try a similar test fully in C++ to see if those times are so high also there?

I honestly doubt is worth it at the moment. If you see the difference between using the index and the frame name or avoiding doing the homogenousTransform call from matlab, for me is clear that the c++ code runs faster and that the swig conversions are slower. So I would further improve the swig part before like avoiding the toMatlab call. Then if we are still not satisfied with the time we can try to look more deeply at the C++ code.

How do you use the output vector in Matlab? It seems that Matlab is pretty bad in for loops and if clauses. Maybe you can move some more logic in the C++ side to reduce the amount of loops done in Matlab.

That is exactly my goal with avoiding the toMatlab call. For now, after I receive the vector of matrices, I need to transform each to a matlab variable, so I enter a for loop converting each from c++ to matlab, for efficiency in the update function I use the same for loop to set the resulting matrix as the transform for the meshes. See updateVisualization

fjandrad commented 4 years ago

Found a bug, it was not updating all the transforms just the first. Real times are:

% using iDyntree string vector
stringVector_time =    0.0157
% using a string cell array with the link names
cellArray_time =    0.0241

Still improved but not as dramatically. timeUpdateComp

Fixed in 730ac9f5a61b28b55d09c7643cd222f7b209e157

Benchmarking code (to use rename file extension to .m): testGetWorldTransformTimes.txt

S-Dafarra commented 4 years ago

While I was thinking about what could be the cause for the cellArray to be taking so much more time than the stringVector, I noticed that you are passing this input by copy here. Maybe it is taking more time because each time you call it, Matlab has to convert this data structure in a vector, allocating memory, which is then destroyed after the execution. A simple fix is to pass a const reference to the vector (like in the first comment) and define this vector only once at the beginning, before iterating getWorldTransforms. This goes in the direction of "allocating memory at every cycle is bad".

traversaro commented 4 years ago

Exactly, converting the input to const std::vector<std::string>& frameNames is definitely a good idea.

fjandrad commented 4 years ago

Test between function signatures

std::vector<iDynTree::Matrix4x4> getWorldTransformsAsHomogeneous(const std::vector<std::string>& frameNames);

std::vector<iDynTree::Matrix4x4> getWorldTransformsAsHomogeneousNoConst(std::vector<std::string>& frameNames);

traj\experiments	const	noConst
100 iterations	0.0131	0.0130
100 iterations 2	0.0188	0.0185
100 iterations 3	0.0132	0.0131
100 iterations 4	0.0133	0.0132
300 iterations	0.0129	0.0129

Seems that the change of signature makes no difference. I will leave it const since is more proper on c++ side even if there is no time benefit

S-Dafarra commented 4 years ago

Thanks for trying, but in the code it seems you forgot the & https://github.com/robotology/idyntree/blob/f1f61356eb7e5ad6606d218613ea90af9dc9b609/src/high-level/include/iDynTree/KinDynComputations.h#L388

You are still passing by copy 😁

Btw..at what time did you test it? 😲

fjandrad commented 4 years ago

You are still passing by copy grin

Oops, you are right I think the answer to this is the answer to the next question haha.

Btw..at what time did you test it? astonished

Baby woke me up at 4 couldn't go back to sleep so might as well work a bit. Guess I wasn't as awake as I thought. I'll redo the tests and see if it changes.

fjandrad commented 4 years ago

traj\experiments	const	noConst	plot
500 iterations	0.0125	0.0125

Seems that the change of signature makes no difference. I will leave it const since is more proper on c++ side even if there is no time benefit

It is still valid. I guess reaffirms my idea that is not the C++ code doing the slowing down.

fjandrad commented 4 years ago

By the way I found this:

Please note that for const parameters or return types used in a function, SWIG pretty much ignores the fact that these are const, see the section on const-correctness for more information.

Here

S-Dafarra commented 4 years ago

It is still valid. I guess reaffirms my idea that is not the C++ code doing the slowing down.

Too bad. I agree that the C++ implementation may not change that figure too much, but I guess that at least its interface may have an effect. If I understood well, the bindings do not copy the implementation, but with some black magic (similar to the one to launch C++ code in Java applications) it is able to create a function which call the corresponding method in the C++ library. Hence, if this method in particular requires a lot of time (and not all the bindings methods), and this time change depending on the type of output (as you shown above), then I would infer that what matters it's in the signature of this method. The other possibility is that the method itself is costly, also in the C++ side.

Regarding the "const" by itself, I was not expecting any difference. Apart from some compiler optimization, it should not change much from the performance point of view. Actually, in the past we noticed that you can modify from Matlab some variables which are not supposed to be modified (they were marked const in C++). I was hoping in some more difference when using the reference though.

fjandrad commented 4 years ago

Finally, I was able to finish the extra funciton in swig `toMatlab` for the Matrix4x4Vector. The results are the following:	traj\experiments	getHT	getHT + use Iterator	getHT + to Matlab
100 iterations	0.0005	0.0135	0.0006

toMatlabComp2

As seen from the results, the main time consumption now is in the actual getWorldTransformsAsHomogenous function call. Not sure if its in the C++ side or in the swig conversion side.

The time for the updates becomes: toMatlabComp

This time we were able to get a x5 times faster update in the visualization. The difference with respect to the previous timings seems to be matlab going inside the for loop to update the transform objects. As can be seen here: toMatlabComp3

So now matlab for calls and assignment of matrices is the bottleneck.

Added in e398eb95d3d7b8e6b644d30971c2926b7907659a

Scripts used to test: testGetWorldTransformTimes.txt testToMatlab.txt

fjandrad commented 4 years ago

PR merged so we can close this.

robotology / idyntree

Create function that returns mutiple transforms based on a list of frames #659