Generalize tactile_bias.py to handle multiple sensors

rhaschke commented 3 years ago

... and deserialize msgs into numpy arrays, which is more efficient than deserializing into a temporary list.

guihomework commented 3 years ago

This code introduces 2 regressions

no protection against change of size of the incoming data (was existing in the PR #23 here https://github.com/ubi-agni/tactile_toolbox/blob/melodic-devel/tactile_state_publisher/src/tactile_state_publisher/tactile_bias.py#L44)


[ERROR] [1608298467.470506]: bad callback: <bound method tactile_bias.callback of <tactile_state_publisher.tactile_bias.tactile_bias object at 0x7f3a1e7658d0>>
Traceback (most recent call last):
File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/topics.py", line 750, in _invoke_callback
cb(msg)
File "/home/gwalck/work/dev/catkin_ws_tactile/src/tactile_toolbox/tactile_state_publisher/src/tactile_state_publisher/tactile_bias.py", line 38, in callback
self.compute(sensor.values, bias)
File "/home/gwalck/work/dev/catkin_ws_tactile/src/tactile_toolbox/tactile_state_publisher/src/tactile_state_publisher/tactile_bias.py", line 42, in compute
values = values - bias.values
ValueError: operands could not be broadcast together with shapes (32,) (33,)

* computation of stddev does not give same result as past tests (I think there should not be a 1/(N-1). if counting starts at 0 one should divide by the number of elements both in the expectation of the squares and in the expectation of the average)
 see pasted result lower when incoming data is the same message repeated so std dev should be zero but it is not. With real data it is not easy to see this error, but with the test data (as done in PR #23) one immediately sees in the printed output that the stddev is wrong.

Missing functionality linked to switching to multiple sensor support : 

* the display message should be upgraded to include the sensor name. otherwise one sees 2 identical messages not distinguishing which channel is which one (except from sensor length but they might not necessarily be different)

[INFO] [1608299035.178256]: Acquired 300 samples [INFO] [1608299035.182675]: computed bias: [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.

1. 1. 1. 1. 5.
2. 1. 1. 1. 23.
1. 1. 1. 1. 63.] [INFO] [1608299035.185463]: std deviation: [0. 0.05783149 0.11566299 0.17349448 0.23132597 0.28915747 0.34698896 0.40482045 0.46265195 0.52048344 0.57831493 0.63614643 0.69397792 0.75180941 0.8096409 0.8674724 0.92530389 0.98313538 1.04096688 1.09879837 1.15662986 1.21446136 1.27229285 1.33012434 1.38795584 1.44578733 1.50361882 1.56145032 1.61928181 1.6771133
0.05783149 0.11566299 0.17349448 0.23132597 0.28915747 0.34698896 0.40482045 0.46265195 0.52048344 0.57831493 0.63614643 0.69397792 0.75180941 0.8096409 0.8674724 0.92530389 0.98313538 1.04096688 1.09879837 1.15662986 1.21446136 1.27229285 1.33012434 1.38795584 1.44578733 1.50361882 1.56145032 1.61928181 3.4120581 3.46988959 3.52772108 3.58555258 3.64338407] [INFO] [1608299035.186954]: Acquired 300 samples [INFO] [1608299035.188800]: computed bias: [ 0. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
1. 1. 1. 1. 63.] [INFO] [1608299035.190407]: std deviation: [0. 0.11566299 0.17349448 0.23132597 0.28915747 0.34698896 0.40482045 0.46265195 0.52048344 0.57831493 0.63614643 0.69397792 0.75180941 0.8096409 0.8674724 0.92530389 0.98313538 1.04096688 1.09879837 1.15662986 1.21446136 1.27229285 1.33012434 1.38795584 1.44578733 1.50361882 1.56145032 1.61928181 3.4120581 3.46988959 3.52772108 3.58555258 3.64338407]

rhaschke commented 3 years ago

no protection against change of size of the incoming data

I dropped that on purpose. I cannot see a use case where you would like to change the dimensionality of your sensor data. The only situation I could come up with, is that there are multiple different publishers publishing to the same topic and thus reporting different sensor sizes. In the new code, I handle this by different biases for each sensor name - assuming that different publishers would use different sensor names, of course.

computation of stddev does not give same result as past tests (I think there should not be a 1/(N-1).

Good that you noticed that. The proposed change is correct: to compute the stddev of a sample, we need to normalize by N-1. See here for a theoretical (statistical) explanation.

the display message should be upgraded to include the sensor name

I thought about this. However, implementing this would require to pass the name parameter into the compute() function, which I thought isn't worth the extra computational effort. Actually, I would like to change log level to DEBUG for those messages (or remove them altogether). There is no particular value in seeing them.

guihomework commented 3 years ago

no protection against change of size of the incoming data

I dropped that on purpose.

dropping functionality should be, IMHO, mentioned in a commit message, at least in the details if not in the summary, otherwise the reviewer cannot know it is on purpose.

computation of stddev does not give same result as past tests (I think there should not be a 1/(N-1).

Good that you noticed that. The proposed change is correct: to compute the stddev of a sample, we need to normalize by N-1. See here for a theoretical (statistical) explanation.

So you imply then that if I provide 300 identical samples the standard deviation is not null ? octave tells so at least.

a=ones(200,1)*(1:1:64);
octave:5> std(a)
ans =

 Columns 1 through 20:

   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

 Columns 21 through 40:

   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

 Columns 41 through 60:

   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

 Columns 61 through 64:

   0   0   0   0

because your code (so the proposed changed as you call it) is not giving a null result during my test (see pasted output in my initial comment)

the display message should be upgraded to include the sensor name

I thought about this. However, implementing this would require to pass the name parameter into the compute() function, which I thought isn't worth the extra computational effort. Actually, I would like to change log level to DEBUG for those messages (or remove them altogether). There is no particular value in seeing them.

Please discuss with @GereonBuescher on what he needs and not, or with Risto who was the original contractor for this feature. I personally like this display, I know the bias occurred, and see if it is a valid bias.

rhaschke commented 3 years ago

dropping functionality should be, IMHO, mentioned in a commit message, at least in the details if not in the summary, otherwise the reviewer cannot know it is on purpose.

Yes, I agree. Sorry for filing the commit message in a rush - as usual.

So you imply then that if I provide 300 identical samples the standard deviation is not null ?

Oops. That's a bug! Thanks for pointing that out. Fixed now.

I dropped that (dimension check) on purpose

Do you see still a need for this check or not?

guihomework commented 3 years ago

dropping functionality should be, IMHO, mentioned in a commit message, at least in the details if not in the summary, otherwise the reviewer cannot know it is on purpose.

Yes, I agree. Sorry for filing the commit message in a rush - as usual.

So you imply then that if I provide 300 identical samples the standard deviation is not null ?

Oops. That's a bug! Thanks for pointing that out. Fixed now.

I admit I had not good enough knowledge on the "population" and "sample" difference for computing standard dev sigma (pop) or S (from sample), hence my fix was to put both at N. Overall this computation of stddev is a helper and human would not see the difference at all, with one fix or the other OR with the bug. I even thought of displaying only significant stddev values to warn the user he did not "rest" the sensor nice during biasing (which could replace the constant print, by a warning print in case the stddev is bad on one channel), but did not implement it.

but now that I made myself more aware of the difference between population and sample, conceptually I am not totally convinced yet that N-1 is correct here. I think the 200 sample to store the bias, is the "whole population" that we want to consider for the bias. The bias won't change anymore afterwards. Unless you consider that the noise of the sensor is constantly changing, which in this case the N or N-1 won't solve. I consider a scale we want to tare for the weight of a container. we press the tare button, it takes N samples and then biases the result. the container won't change mass, but is it a sub-sample of the noise of the sensor or the whole population of the noise that we consider ? Is noise distributed the same as a population and the mean would then indeed be different in the sample vs the "whole" recording ?

conceptually this is disturbing.

I dropped that (dimension check) on purpose

Do you see still a need for this check or not?

I just faced this error message when tried to change size of vectors during testing, it won't occur during normal operations, and so saving one line and one test might be again, winning a few microseconds which seem to matter a lot. I like to have "idiot proof" software, but they cannot be compact and fast AND idiot proof. So your choice to revert this drop or not regarding your conception of a software used by non-trained users.

rhaschke commented 3 years ago

conceptually this [the difference between sample and population mean] is disturbing.

It's as simple as that: If you want to compute the std-dev of a finite sample of a larger distribution (here the distribution of sensor values at rest), then you should use N-1. But, as you mentioned, this is nitpicking now.

Do you see still a need for this check or not?

I just faced this error message when tried to change size of vectors during testing, it won't occur during normal operations

Good. Then, I took the right decision, removing it. Production code shouldn't handle failure cases occurring during testing only.

guihomework commented 3 years ago

conceptually this [the difference between sample and population mean] is disturbing.

It's as simple as that: If you want to compute the std-dev of a finite sample of a larger distribution (here the distribution of sensor values at rest), then you should use N-1. But, as you mentioned, this is nitpicking now.

I don't understand. the "larger distribution of sensor at rest" is only during a finite time (when it is at rest) and we can sample the whole population of this finite time isn't it ? I just want to understand a concept not that you change the value.

rhaschke commented 3 years ago

No, in principle we could sample forever in this rest state. It's our decision to stop after N samples.

ubi-agni / tactile_toolbox

Generalize tactile_bias.py to handle multiple sensors #24