ubi-agni / tactile_toolbox

software to handle tactile sensors in ROS
9 stars 5 forks source link

Tactile bias #23

Closed guihomework closed 3 years ago

guihomework commented 3 years ago

This small python code permits to bias tactile_state streams and provides a service to bias on the fly The reason for this code is 2-fold:

guihomework commented 3 years ago

Hi, thanks for the review. I think I will cancel this PR, and assign to you the rewrite of this whole tool for @GereonBuescher as I made again so many mistakes and non recent coding.

I have reasons for all my choices but you won't like them :

I will for my own curiosity just test the speed increase with numpy, to convince myself, but not rewrite all in cpp.

rhaschke commented 3 years ago

I never used numpy_msg before myself, but it sounds reasonable that this should be faster than handling standard lists. I think if you are going to compare the performance and replace standard lists with numpy that should be sufficient for this PR to get merged. I won't rewrite this code myself.

guihomework commented 3 years ago

I never used numpy_msg before myself, but it sounds reasonable that this should be faster than handling standard lists.

I have no idea why you are so convinced of this. If you did not use yourself how can you know that ? Now that I tested, I have some proofs and I am now certain that one needs to evaluate because it "depends" on vector size, message type. For instance if one has a message type with lots of vectors but use only one, the extra 5 us conversion time for all other vectors is then wasted doing the numpy deserialization.

I will fix the PR using numpy for the computation but not for the deserialization.

rhaschke commented 3 years ago

Could you please provide the commit to your testing code as well?

guihomework commented 3 years ago

Could you please provide the commit to your testing code as well?

I did not prepare a real "testing" code doing all tests, but just had 3 branches where I progressively updated the code https://github.com/ubi-agni/tactile_toolbox/tree/std_append https://github.com/ubi-agni/tactile_toolbox/tree/std_numpy https://github.com/ubi-agni/tactile_toolbox/tree/numpydeser_numpycompute

I then provided

rostopic pub /tactile_states tactile_msgs/TactileState "header:
  seq: 2
  stamp:
    secs: 3
    nsecs: 0
  frame_id: ''
sensors:
- name: 'val'
  values: [0.0, 1.0,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,0.0, 1.0,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,59,60,61,62,63]"  -r 1000
rhaschke commented 3 years ago

I tested performance myself and it turns out that deserializing and computing with numpy arrays is almost constant:

values:  10
naive:  0.14430878199999997
numpy:  0.05995462600000012
numpy msg:  0.014784321999999905

values:  100
naive:  0.2672266779999999
numpy:  0.0166489620000001
numpy msg:  0.013114491000000061

values:  1000
naive:  0.6687423239999999
numpy:  0.053259358000000034
numpy msg:  0.01223250000000009

Each test was averaging over 1000 values.

guihomework commented 3 years ago

I tested performance myself and it turns out that deserializing and computing with numpy arrays is almost constant:

values:  10
naive:  0.14430878199999997
numpy:  0.05995462600000012
numpy msg:  0.014784321999999905

values:  100
naive:  0.2672266779999999
numpy:  0.0166489620000001
numpy msg:  0.013114491000000061

values:  1000
naive:  0.6687423239999999
numpy:  0.053259358000000034
numpy msg:  0.01223250000000009

Each test was averaging over 1000 values.

from the test software you sent separately I don't think you tested the same think as I did. I did not test the bias averaging processing but the processing of the data AFTER initialization. It seems your 1000 iteration just accumulate the data in the class to prepare the bias average until initialization is done. so it does not compute the real processing time of each incoming message but shows the computation of the average is slow which I knew but had reasons to not do the average with an intermediate sum. I never used sample size of 1000, I used the default of 2 and always discarded my timing when the bias was not initialized.

I also see that your numpy deserialize still uses the numpy.asarray(data.sensors[0].values) of an values that is already numpy.

I had 3 different versions of the computation (naive, numpy and numpy_msg), you seem to use only 2 (2 classes) for 3 cases. I believe it is not correct

guihomework commented 3 years ago

I updated the code to use numpy for computation of the average but keep the array storage. I am not sure it is faster to stack with numpy. Did not evaluate it. The array storage permits to also compute the standard deviation and give hints if the bias stems from a "stable" sampling or not. I wanted to do that originally but had no time and was left for future work. Now I did it at the same time as the upgrade to numpy.

similarly, in your example test program you changed the sample size, so I added the possibility for the user to change the sample size as well, since I was refactoring the node/module anyway due to catkin_lint not liking the executable in the module folder.

Hope this satisfy most of the change requests.

guihomework commented 3 years ago

I tested performance myself and it turns out that deserializing and computing with numpy arrays is almost constant:

I really did not like your result and spend several hours more testing using your test code now.

I finally found that your test code is hugely biased by the fact that the 3 tests are started in a row in the same code. just inverse the order of the test, and put the _numpymsg test first and naive at the bottom, suddenly naive is faster than _numpymsg there is memory initialization that are not accounted for here

I added a code to select which test to run and start the test of 3 types separately (so all have to init mem) this gives totally different results a little closer to what I had found

values:  100
naive:  0.162
numpy:  0.128
numpy msg:  0.090

values:  64
naive:  0.142
numpy:  0.118
numpy msg:  0.088

values:  10
naive:  0.096
numpy:  0.086
numpy msg:  0.089

values:  1
naive:  0.072
numpy:  0.090
numpy msg:  0.090

basically for a vector size 100, the numpy_msg is 128-90 = 38 us difference with the non-numpy. I had found 3us difference when calculating closer to the actual operations (really only serialize and computation of numpy not the calling of the function and all other operations) but it seems instantiating the callback and passing data on the heap also is faster when using numpy_msgs which I did not calculate in my test.

Again, for 64 sized vector it could maybe be worth to save 118-88 = 30 us per message (this is now 10 times what I had found) but I am not sure it is going to be true in all cases as deserializing all vectors to numpy takes more time (see that the time is not compressible with small vectors deserialized to numpy) and could not scale up well with multiple fields in a msg needing conversion when only one is accessed.

I would prefer to not use numpy_msg here.

rhaschke commented 3 years ago
Indeed, the initial test is much slower than subsequent ones. Very strange. I tested again myself and also distinguished between bias computation (1st row) and bias usage (2nd row): values 10 64 100 1000
naive1 0.080 0.148 0.211 1.378
0.079 0.144 0.225 1.503
naive2 0.070 0.096 0.141 0.836
0.070 0.081 0.125 0.549
numpy 0.101 0.099 0.149 0.429
0.092 0.090 0.133 0.423
numpy msg 0.100 0.080 0.102 0.091
0.092 0.072 0.092 0.088

The results are more similar to yours now. Nevertheless, using numpy for deserialization and computation is most performant in most cases. Only for very small vector sizes, it seems to be more beneficial to use standard lists. Hence, I don't really understand why you prefer not to use numpy_msg?

guihomework commented 3 years ago

The results are more similar to yours now. Nevertheless, using numpy for deserialization and computation is most performant in most cases. Only for very small vector sizes, it seems to be more beneficial to use standard lists. Hence, I don't really understand why you prefer not to use numpy_msg?

thanks for redoing your test. reasons to not switch :

rhaschke commented 3 years ago
  • not (widely) used after 11 years existing !

That's not really a counter argument, is it?

  • most of our messages are smaller than 64, rather 20 or 12 or even 4 => less performant then

Your code doesn't yet apply to the Shadow Hand sensors, does it? It is constrained to sensors[0]. Hence, I thought it was (primarily) targeting the tactile glove... In any case, the timing differences for small vector sizes are marginal. Given the much higher code readibility, I would still prefer the vectorized version.

  • unsure about other types of messages, and overhead of converting the whole message to numpy

Not sure, what you actually mean by this. No other messages are affected. The "overhead of converting the whole message" is already measured in our tests. By the way, it's only vectors that are handled as numpy vectors. Nothing else.

guihomework commented 3 years ago
  • not (widely) used after 11 years existing !

That's not really a counter argument, is it?

you never switched to this before in any of your code, nobody did. There must be a reason. Why would such a so efficient code not be used anywhere (almost)

  • most of our messages are smaller than 64, rather 20 or 12 or even 4 => less performant then

Your code doesn't yet apply to the Shadow Hand sensors, does it? It is constrained to sensors[0].

correct, but we also have poking stick for instance or others using single sensors[0] too. and even if I adapt my code to support N sensors, then the hand will have smaller vectors

Hence, I thought it was (primarily) targeting the tactile glove... In any case, the timing differences for small vector sizes are marginal. Given the much higher code readibility, I would still prefer the vectorized version.

I would argue that the win is marginal. Also memory initialization is occurring during normal use in ROS with incoming data on the middleware (and not calling the callback as a function only) so this test is still biased I would say and does not reflect real usage.

  • unsure about other types of messages, and overhead of converting the whole message to numpy

Not sure, what you actually mean by this. No other messages are affected. The "overhead of converting the whole message" is already measured in our tests. By the way, it's only vectors that are handled as numpy vectors. Nothing else.

I am telling that numpy_msg is not a "solution in all cases". In some future cases, one needs to be vigilant an redo similar timing testing. I suspect that a message like JointStates, in which there are 3 vectors of N values, would create more overhead to deserialize to numpy (because there is overhead doing so we have seen it) the 3 vectors when only positions would for example be used by the main loop (no velocity or no effort) and so only position could be locally converted to numpy and be totally fine in terms of efficiency and probably better.

Last point, I don't like to change, to just change. Change involve risks. If changes are not necessary, not changing is not taking more risks. Simple benefit/risk ratio.

rhaschke commented 3 years ago

I would argue that the win is marginal.

Looks like we agree that the performance difference (the one or other direction) is marginal for small vectors. For larger vectors, it quickly gets strongly better for numpy... However, readability is much better using the vectorized version in all cases, which is a clear win. You didn't comment on that.

I am telling that numpy_msg is not a "solution in all cases".

But, that's not questioned here. We are talking - all the time - about the tactile_bias.py script only, don't we?

I don't like to change, to just change. Change involve risks.

We don't change anything. We introduce a new script. But, indeed, we might want to think about to apply this new scheme somewhere else as well.

guihomework commented 3 years ago

I would argue that the win is marginal.

Looks like we agree that the performance difference (the one or other direction) is marginal for small vectors.

I expressed myself wrongly. I don't think for the tactile_bias the win of using numpy is significant and worth the risk to use a non commonly used software.

For larger vectors, it quickly gets strongly better for numpy... However, readability is much better using the vectorized version in all cases, which is a clear win. You didn't comment on that.

I never said I would not use the vectorized for I already pushed commits using numpy and told so, just not using numpy_msg (to unknown to me)

I am telling that numpy_msg is not a "solution in all cases".

But, that's not questioned here. We are talking - all the time - about the tactile_bias.py script only, don't we?

I want to remind to not take the example and evaluation done for this PR as granted. I wrote that before and stand on this. That's it.

I don't like to change, to just change. Change involve risks.

We don't change anything. We introduce a new script. But, indeed, we might want to think about to apply this new scheme somewhere else as well.

play on word. Introducing a new library IS a change. The PR is using a tested code that was started one year ago, and used to produce data, and re-tested (with a tiny change removing the calibration part).

So for me the tactile_bias is simple, not new code properly speaking, but numpy_msg is unknown to me.

Please create subsequent PR that does the change to numpy_msg and merge this one with already significant compromises matching the review comments.

guihomework commented 3 years ago

I wrote this in my very first answer

I think I will cancel this PR, and assign to you the rewrite of this whole tool for @GereonBuescher as I made again so many mistakes and non recent coding.

To which in the code comments you answered

I won't rewrite this code myself.

Which led me to continue apply the changes to match the review comments and get it merged (required for a customer) but there is less than 10% code left from my code in the PR #24 I should have cancelled PR #23 so that both of us win time. Sorry.

The only thing I gained from this "adventure" is the need to further evaluate numpy_msgs but in most case use numpy for average computation.