robotology / whole-body-estimators

YARP devices that implement estimators for humanoid robots.
26 stars 12 forks source link

Wholebodydynamics may not update the force measurements for more than 100ms #20

Open S-Dafarra opened 5 years ago

S-Dafarra commented 5 years ago

During the Maker faire Demos it was happening that reading the cartesianEndEffectorWrench ports was returning a null pointer for 10 times (trying every millisecond) in a row. The lines which do this check are https://github.com/robotology/walking-controllers/blob/master/modules/Walking_module/src/RobotHelper.cpp#L65-L85

This seemed to happen mainly when using Wi-Fi, but it did not see to be a range problem as the robot was walking right in front of the router.

@GiulioRomualdi

S-Dafarra commented 5 years ago

Today we had again a similar issue, but this time the robot was wired (the Wi-Fi router was switched off) and the delay was more than 100ms. I remember back in time we had a similar issue because of the IMU. Do you think this may be caused by another sensor? @diegoferigo @traversaro

traversaro commented 5 years ago

During the Maker faire Demos it was happening that reading the cartesianEndEffectorWrench ports was returning a null pointer for 10 times (trying every millisecond) in a row. The lines which do this check are https://github.com/robotology/walking-controllers/blob/master/modules/Walking_module/src/RobotHelper.cpp#L65-L85

I may be wrong, but if WBD is running with a frequency of 100 Hz, I think it make sense that polling on the port every 1 ms only works 1 out of 10 times, no? Considering that the communication happens over a lossy and non deterministic network, it is possible that in some cases the data is only available every 20 ms, or after 11~12 milliseconds.

diegoferigo commented 5 years ago

I remember that debugging that Bosh IMU problem has been quite demanding, and it was solved by profiling the code. Is the problem repeatable enough?

S-Dafarra commented 5 years ago

Is the problem repeatable enough?

It happens especially when we have demos :sweat_smile:

diegoferigo commented 5 years ago

Best period to debug this problem than :)

S-Dafarra commented 5 years ago

I was scanning quickly the code and I spotted this line:

https://github.com/robotology/codyco-modules/blob/master/src/devices/wholeBodyDynamics/WholeBodyDynamicsDevice.cpp#L1734

Is this contacting the yarpserver?

traversaro commented 5 years ago

Today we had again a similar issue, but this time the robot was wired (the Wi-Fi router was switched off) and the delay was more than 100ms. I remember back in time we had a similar issue because of the IMU. Do you think this may be caused by another sensor? @diegoferigo @traversaro

How do you know that the data was not read for 100 ms?

In general, such a delay could be due to the WBD thread not being scheduled or not running for a lot of time for some reason (in the case of IMU, the read method was blocking for a long time) or because something in the YARP network communication is not working fine.

traversaro commented 5 years ago

I was scanning quickly the code and I spotted this line:

https://github.com/robotology/codyco-modules/blob/master/src/devices/wholeBodyDynamics/WholeBodyDynamicsDevice.cpp#L1734

Is this contacting the yarpserver?

I have no idea, but that part of code is a useless leftover, feel free to remove it as for sure it creates more harm then benefit.

traversaro commented 5 years ago

Is this contacting the yarpserver?

Apparently not, but it is using some kind of mutex so it is a good idea to remove it in any case: https://github.com/robotology/yarp/blob/cbd800e618d89ae5790af8d75148f02daf203d29/src/libYARP_OS/src/PortCore.cpp#L1473 .

S-Dafarra commented 5 years ago

How do you know that the data was not read for 100 ms?

The walking tried to read from the external wrench ports for 100 times, once every millisecond, but it was receiving NULL from both the ports all the time.

In general, such a delay could be due to the WBD thread not being scheduled or not running for a lot of time for some reason (in the case of IMU, the read method was blocking for a long time) or because something in the YARP network communication is not working fine.

We know that FT boards are currently having firmware issues (https://github.com/robotology/icub-tech-support/issues/854) experiencing crashing once in a while, so I was hoping to run that profiling again.

Apparently not, but it is using some kind of mutex so it is a good idea to remove it in any case: https://github.com/robotology/yarp/blob/cbd800e618d89ae5790af8d75148f02daf203d29/src/libYARP_OS/src/PortCore.cpp#L1473 .

I saw the same, but PortCore seems disappeared in devel.

diegoferigo commented 5 years ago

At that time I had to develop a ThreadStats class https://github.com/robotology/yarp/pull/1230, that for lack of time was never merged upstream (and we never needed it again since).

As per https://github.com/robotology/yarp/pull/1230#issuecomment-544891824, I would suggest to either use more advanced libraries like Celtoys/Remotery, or embed them in a new class to propose upstream.

traversaro commented 5 years ago

How do you know that the data was not read for 100 ms?

The walking tried to read from the external wrench ports for 100 times, once every millisecond, but it was receiving NULL from both the ports all the time.

So this line https://github.com/robotology/walking-controllers/blob/f70d2646bb564e4aea607c8f9463b7aec390d04c/modules/Walking_module/src/WalkingModule.cpp#L579 was modified to have 100? This is not related to the issue itself, but I noticed that the controller waits until the feedback is available, that is a bit a dangerous strategy. Have you considered using the old values without stopping the controller, and actually stop the controller when the maximum timeout is reached?

S-Dafarra commented 5 years ago

So this line https://github.com/robotology/walking-controllers/blob/f70d2646bb564e4aea607c8f9463b7aec390d04c/modules/Walking_module/src/WalkingModule.cpp#L579 was modified to have 100? This is not related to the issue itself, but I noticed that the controller waits until the feedback is available, that is a bit a dangerous strategy. Have you considered using the old values without stopping the controller, and actually stop the controller when the maximum timeout is reached?

Yes, exactly, it was a quick test but that error got triggered anyways. Regarding the safety issues, we can think about it.

S-Dafarra commented 5 years ago

As per robotology/yarp#1230 (comment), I would suggest to either use more advanced libraries like Celtoys/Remotery, or embed them in a new class to propose upstream.

Do you think we would need all this machinery for a first study?

traversaro commented 5 years ago

Do you think we would need all this machinery for a first study?

No, probably just adding a double variable in the WBD in which you save the last end of execution, and you can check from run to run if the difference of execution time is not so different from the period.