Closed fjandrad closed 4 years ago
A first dumb implementation could simply use a for
cycle and call the getWorldTransform
method for each of the requested frames, checking that they actually exist. Later, we can try to identify if there is some bottleneck.
For testing purposes the following funcitons were implemented:
std::vector<iDynTree::Transform> getWorldTransforms( std::vector<int> frameIndexes);
std::vector<iDynTree::Transform> getWorldTransformsIndex( std::vector<int> frameIndexes);
std::vector<iDynTree::Transform> getWorldTransforms(std::vector<std::string> frameNames);
std::vector<iDynTree::Matrix4x4> getWorldTransformsAsHomogeneous(std::vector<std::string> frameNames);
For testing/benchmarking the following script was used:
testGetWorldTransformTimes.txt
The results are
The lines with ind as legend means get each transform individually in a for loop. The others are in the same order as the mentioned implemented functions. This suggest that any improvements on the C++ side might have negligible impact on the overall timings in Matlab.
It can be seen that there seems to be no gain in getting all the transforms except for the function that gives directly the homogenous transform. The reason is probably that even if we get a vector of transforms we still need to iterate through the vector and transform each individual transform to matlab matrix. The reason homogeneous transform seem to have the upper hand is because we save the asHomogeneousTransform step in the bindings.
Here we can se why it takes less time to get directly the homogeneous transform:
Despite it takes half the time to do just toMatlab, the time decreases by a third since we go through the iterator to ge the individual matrix and then do to Matlab. If we do it in a single line we get a further optimization although small.
I doubt it is possible, but what could further help for the visualization in terms of time optimization is to be able to get from a single swig function directly matlab matrices without the need to do the toMatlab call.
Looking at the code I see there is some logic to get the semantics. If we remove the logic for the semantics that is not being used in the visualization in Matlab we get:
It seems removing the semantics does not give any gain.
With the changes and the use of the new function we have that the mean of 100 update times are:
% using iDyntree string vector
stringVector_time = 0.0018
% using a string cell array with the link names
cellArray_time = 0.0275
It is an order of magnitude difference.
Done in 2a846e0738582f9d671e63ccf538a1cf1265e067
It seems that CI is failing. The message is:
@traversaro this does not seem related to the changes I did. Could I proceed to open a pull request? Should I open an issue for this error?
It seems that CI is failing.
See also https://github.com/robotology/idyntree/pull/667#issuecomment-608435760. Probably it is enough for you to rebase on master.
I would have some curiosities:
What is the difference between std::vector<iDynTree::Transform> getWorldTransforms( std::vector<int> frameIndexes);
and std::vector<iDynTree::Transform> getWorldTransformsIndex( std::vector<int> frameIndexes);
.
In C++ this implementation may be inefficient because you are returning by copy a vector. If instead you pass the output vector as a parameter, you can resize it once beforehand. Then, multiple calls of this function would avoid further memory allocation, which can be very costly in C++. Is it possible in Matlab to pass the output vector as a parameter or there are difficulties? Would it make any difference on the time?
In case it is possible to pass the output vector as a parameter, to which object would it correspond in the C++ side? Is it possible to write an implementation which can avoid the toMatlab
or fromMatlab
(if it exists) calls? Maybe using templates?
Is it worth to try a similar test fully in C++ to see if those times are so high also there?
How do you use the output vector in Matlab? It seems that Matlab is pretty bad in for
loops and if
clauses. Maybe you can move some more logic in the C++ side to reduce the amount of loops done in Matlab.
It seems that CI is failing. The message is:
@traversaro this does not seem related to the changes I did. Could I proceed to open a pull request? Should I open an issue for this error?
This was fixed by https://github.com/robotology/idyntree/pull/667 , please rebase and it should work fine.
I doubt it is possible, but what could further help for the visualization in terms of time optimization is to be able to get from a single swig function directly matlab matrices without the need to do the toMatlab call.
It is possible, but you need to inject the C++ code directly in the SWIG generated code, see for example as the toMatlab call itself is implemented: https://github.com/robotology/idyntree/blob/114671fb1a012f6214b35b8462d463fea99781fd/bindings/matlab/matlab_matvec.i#L119 .
> What is the difference between std::vector<iDynTree::Transform> getWorldTransforms( std::vector<int> frameIndexes); and std::vector<iDynTree::Transform> getWorldTransformsIndex( std::vector<int> frameIndexes);.
It was a test I did when seeing that the time between using indexes and a frame name was pretty much the same. I had the doubt if maybe because of the overloading of the function something was taking a bit of extra time. So I created one that only takes integers, but the result was the same. It was deleted later since it was purely to test that.
In C++ this implementation may be inefficient because you are returning by copy a vector. If instead you pass the output vector as a parameter, you can resize it once beforehand. Then, multiple calls of this function would avoid further memory allocation, which can be very costly in C++. Is it possible in Matlab to pass the output vector as a parameter or there are difficulties? Would it make any difference on the time?
My understanding is that we can don't know if it would be more efficient. I can give it a try.
In case it is possible to pass the output vector as a parameter, to which object would it correspond in the C++ side? Is it possible to write an implementation which can avoid the toMatlab or fromMatlab (if it exists) calls? Maybe using templates?
Avoiding the toMatlab is something I also want to try, even without sending the output vector. For the C++ side I think is straight forward if I create a iDynTree.MatrixVector variable ( soon to be renamed to Matrix4x4Vector ). Now using both the sending of the output parameter and avoid using the toMatlab conversion is something that I'm not sure out my head how to do.
Is it worth to try a similar test fully in C++ to see if those times are so high also there?
I honestly doubt is worth it at the moment. If you see the difference between using the index and the frame name or avoiding doing the homogenousTransform call from matlab, for me is clear that the c++ code runs faster and that the swig conversions are slower. So I would further improve the swig part before like avoiding the toMatlab call. Then if we are still not satisfied with the time we can try to look more deeply at the C++ code.
How do you use the output vector in Matlab? It seems that Matlab is pretty bad in for loops and if clauses. Maybe you can move some more logic in the C++ side to reduce the amount of loops done in Matlab.
That is exactly my goal with avoiding the toMatlab call. For now, after I receive the vector of matrices, I need to transform each to a matlab variable, so I enter a for loop converting each from c++ to matlab, for efficiency in the update function I use the same for loop to set the resulting matrix as the transform for the meshes. See updateVisualization
Found a bug, it was not updating all the transforms just the first. Real times are:
% using iDyntree string vector
stringVector_time = 0.0157
% using a string cell array with the link names
cellArray_time = 0.0241
Still improved but not as dramatically.
Fixed in 730ac9f5a61b28b55d09c7643cd222f7b209e157
Benchmarking code (to use rename file extension to .m): testGetWorldTransformTimes.txt
While I was thinking about what could be the cause for the cellArray
to be taking so much more time than the stringVector
, I noticed that you are passing this input by copy here. Maybe it is taking more time because each time you call it, Matlab has to convert this data structure in a vector, allocating memory, which is then destroyed after the execution. A simple fix is to pass a const
reference to the vector (like in the first comment) and define this vector only once at the beginning, before iterating getWorldTransforms
. This goes in the direction of "allocating memory at every cycle is bad".
Exactly, converting the input to const std::vector<std::string>& frameNames
is definitely a good idea.
Test between function signatures
std::vector<iDynTree::Matrix4x4> getWorldTransformsAsHomogeneous(const std::vector<std::string>& frameNames);
std::vector<iDynTree::Matrix4x4> getWorldTransformsAsHomogeneousNoConst(std::vector<std::string>& frameNames);
traj\experiments | const | noConst | plot |
---|---|---|---|
100 iterations | 0.0131 | 0.0130 | |
100 iterations 2 | 0.0188 | 0.0185 | |
100 iterations 3 | 0.0132 | 0.0131 | |
100 iterations 4 | 0.0133 | 0.0132 | |
300 iterations | 0.0129 | 0.0129 |
Seems that the change of signature makes no difference. I will leave it const since is more proper on c++ side even if there is no time benefit
Thanks for trying, but in the code it seems you forgot the & https://github.com/robotology/idyntree/blob/f1f61356eb7e5ad6606d218613ea90af9dc9b609/src/high-level/include/iDynTree/KinDynComputations.h#L388
You are still passing by copy 😁
Btw..at what time did you test it? 😲
You are still passing by copy grin
Oops, you are right I think the answer to this is the answer to the next question haha.
Btw..at what time did you test it? astonished
Baby woke me up at 4 couldn't go back to sleep so might as well work a bit. Guess I wasn't as awake as I thought. I'll redo the tests and see if it changes.
traj\experiments | const | noConst | plot |
---|---|---|---|
500 iterations | 0.0125 | 0.0125 |
Seems that the change of signature makes no difference. I will leave it const since is more proper on c++ side even if there is no time benefit
It is still valid. I guess reaffirms my idea that is not the C++ code doing the slowing down.
By the way I found this:
Please note that for const parameters or return types used in a function, SWIG pretty much ignores the fact that these are const, see the section on const-correctness for more information.
It is still valid. I guess reaffirms my idea that is not the C++ code doing the slowing down.
Too bad. I agree that the C++ implementation may not change that figure too much, but I guess that at least its interface may have an effect. If I understood well, the bindings do not copy the implementation, but with some black magic (similar to the one to launch C++ code in Java applications) it is able to create a function which call the corresponding method in the C++ library. Hence, if this method in particular requires a lot of time (and not all the bindings methods), and this time change depending on the type of output (as you shown above), then I would infer that what matters it's in the signature of this method. The other possibility is that the method itself is costly, also in the C++ side.
Regarding the "const" by itself, I was not expecting any difference. Apart from some compiler optimization, it should not change much from the performance point of view. Actually, in the past we noticed that you can modify from Matlab some variables which are not supposed to be modified (they were marked const
in C++).
I was hoping in some more difference when using the reference though.
Finally, I was able to finish the extra funciton in swig toMatlab for the Matrix4x4Vector.
The results are the following: |
traj\experiments | getHT | getHT + use Iterator | getHT + to Matlab |
---|---|---|---|---|
100 iterations | 0.0005 | 0.0135 | 0.0006 |
As seen from the results, the main time consumption now is in the actual getWorldTransformsAsHomogenous
function call. Not sure if its in the C++ side or in the swig conversion side.
The time for the updates becomes:
This time we were able to get a x5 times faster update in the visualization. The difference with respect to the previous timings seems to be matlab going inside the for loop to update the transform objects. As can be seen here:
So now matlab for
calls and assignment of matrices is the bottleneck.
Added in e398eb95d3d7b8e6b644d30971c2926b7907659a
Scripts used to test: testGetWorldTransformTimes.txt testToMatlab.txt
PR merged so we can close this.
In cases where we are updating information outside of iDyntree. Such as the new Matlab visualizer .
The idea for this function is:
cc @S-Dafarra @traversaro