Open S-Dafarra opened 5 years ago
Today we had again a similar issue, but this time the robot was wired (the Wi-Fi router was switched off) and the delay was more than 100ms. I remember back in time we had a similar issue because of the IMU. Do you think this may be caused by another sensor? @diegoferigo @traversaro
During the Maker faire Demos it was happening that reading the cartesianEndEffectorWrench ports was returning a null pointer for 10 times (trying every millisecond) in a row. The lines which do this check are https://github.com/robotology/walking-controllers/blob/master/modules/Walking_module/src/RobotHelper.cpp#L65-L85
I may be wrong, but if WBD is running with a frequency of 100 Hz, I think it make sense that polling on the port every 1 ms only works 1 out of 10 times, no? Considering that the communication happens over a lossy and non deterministic network, it is possible that in some cases the data is only available every 20 ms, or after 11~12 milliseconds.
I remember that debugging that Bosh IMU problem has been quite demanding, and it was solved by profiling the code. Is the problem repeatable enough?
Is the problem repeatable enough?
It happens especially when we have demos :sweat_smile:
Best period to debug this problem than :)
I was scanning quickly the code and I spotted this line:
Is this contacting the yarpserver
?
Today we had again a similar issue, but this time the robot was wired (the Wi-Fi router was switched off) and the delay was more than 100ms. I remember back in time we had a similar issue because of the IMU. Do you think this may be caused by another sensor? @diegoferigo @traversaro
How do you know that the data was not read for 100 ms?
In general, such a delay could be due to the WBD thread not being scheduled or not running for a lot of time for some reason (in the case of IMU, the read
method was blocking for a long time) or because something in the YARP network communication is not working fine.
I was scanning quickly the code and I spotted this line:
Is this contacting the
yarpserver
?
I have no idea, but that part of code is a useless leftover, feel free to remove it as for sure it creates more harm then benefit.
Is this contacting the
yarpserver
?
Apparently not, but it is using some kind of mutex so it is a good idea to remove it in any case: https://github.com/robotology/yarp/blob/cbd800e618d89ae5790af8d75148f02daf203d29/src/libYARP_OS/src/PortCore.cpp#L1473 .
How do you know that the data was not read for 100 ms?
The walking tried to read from the external wrench ports for 100 times, once every millisecond, but it was receiving NULL
from both the ports all the time.
In general, such a delay could be due to the WBD thread not being scheduled or not running for a lot of time for some reason (in the case of IMU, the read method was blocking for a long time) or because something in the YARP network communication is not working fine.
We know that FT boards are currently having firmware issues (https://github.com/robotology/icub-tech-support/issues/854) experiencing crashing once in a while, so I was hoping to run that profiling again.
Apparently not, but it is using some kind of mutex so it is a good idea to remove it in any case: https://github.com/robotology/yarp/blob/cbd800e618d89ae5790af8d75148f02daf203d29/src/libYARP_OS/src/PortCore.cpp#L1473 .
I saw the same, but PortCore
seems disappeared in devel
.
At that time I had to develop a ThreadStats
class https://github.com/robotology/yarp/pull/1230, that for lack of time was never merged upstream (and we never needed it again since).
As per https://github.com/robotology/yarp/pull/1230#issuecomment-544891824, I would suggest to either use more advanced libraries like Celtoys/Remotery, or embed them in a new class to propose upstream.
How do you know that the data was not read for 100 ms?
The walking tried to read from the external wrench ports for 100 times, once every millisecond, but it was receiving
NULL
from both the ports all the time.
So this line https://github.com/robotology/walking-controllers/blob/f70d2646bb564e4aea607c8f9463b7aec390d04c/modules/Walking_module/src/WalkingModule.cpp#L579 was modified to have 100? This is not related to the issue itself, but I noticed that the controller waits until the feedback is available, that is a bit a dangerous strategy. Have you considered using the old values without stopping the controller, and actually stop the controller when the maximum timeout is reached?
So this line https://github.com/robotology/walking-controllers/blob/f70d2646bb564e4aea607c8f9463b7aec390d04c/modules/Walking_module/src/WalkingModule.cpp#L579 was modified to have 100? This is not related to the issue itself, but I noticed that the controller waits until the feedback is available, that is a bit a dangerous strategy. Have you considered using the old values without stopping the controller, and actually stop the controller when the maximum timeout is reached?
Yes, exactly, it was a quick test but that error got triggered anyways. Regarding the safety issues, we can think about it.
As per robotology/yarp#1230 (comment), I would suggest to either use more advanced libraries like Celtoys/Remotery, or embed them in a new class to propose upstream.
Do you think we would need all this machinery for a first study?
Do you think we would need all this machinery for a first study?
No, probably just adding a double variable in the WBD in which you save the last end of execution, and you can check from run to run if the difference of execution time is not so different from the period.
During the Maker faire Demos it was happening that reading the
cartesianEndEffectorWrench
ports was returning a null pointer for 10 times (trying every millisecond) in a row. The lines which do this check are https://github.com/robotology/walking-controllers/blob/master/modules/Walking_module/src/RobotHelper.cpp#L65-L85This seemed to happen mainly when using Wi-Fi, but it did not see to be a range problem as the robot was walking right in front of the router.
@GiulioRomualdi