robotology / whole-body-estimators

YARP devices that implement estimators for humanoid robots.
24 stars 12 forks source link

Fixed the possibility of deadlock in detachAll method in wholebodydynamics and baseEstimatorV1 #146

Closed traversaro closed 2 years ago

traversaro commented 2 years ago

While debugging https://github.com/robotology/gazebo-yarp-plugins/pull/619#issuecomment-1084910317, I noticed that the gazebo_yarp_robotinterface while closing was hanging during this phase:

[INFO] interrupt1 phase starting...
[INFO] interrupt1 phase finished.
[INFO] shutdown phase starting...
[INFO] Entering action level 2 of phase shutdown
[INFO] Executing detach action, level 2 on device wholebodydynamics with parameters []

It turns out that indeed there is the possibility of a deadlock. The case is when detachAll is called by thread2 and is able to lock the this->deviceMutex when yarp::os::PeriodicThread (thread1) already started its loop (i.e. step was called) but the run method was not called, i.e. if the execution of thread1 is on a line that goes from https://github.com/robotology/yarp/blob/v3.6.0/src/libYARP_os/src/yarp/os/PeriodicThread.cpp#L226 to https://github.com/robotology/yarp/blob/v3.6.0/src/libYARP_os/src/yarp/os/PeriodicThread.cpp#L246 .

In that case, thread2 is blocked in PeriodicThread::stop (while locking this->deviceMutex) as that call contains a call to join of thread1, but thread1 was blocked at the beginning of the run method (https://github.com/robotology/whole-body-estimators/blob/v0.6.1/devices/wholeBodyDynamics/WholeBodyDynamicsDevice.cpp#L2704) waiting for this->deviceMutex to be available, a classical example of deadlock.

A simple way to get rid of the deadlock is just call the stop of a thread before the this->deviceMutex is locked.

traversaro commented 2 years ago

Interstingly, when using wholebodydynamics with gazebo_yarp_robotinterface this was happening all the time. I am wondering if this was ever happening with yarprobotinterface. A question for people that used the robot recently (I guess @S-Dafarra @GiulioRomualdi @isorrentino @gabrielenava @Giulero but I am just guessing, feel free to tag other people): where you able to close with a simple Ctrl+C the yarprobotinterface that contained the wholebodydynamics device, or it was a necessary multiple Ctrl+C ?

S-Dafarra commented 2 years ago

Interstingly, when using wholebodydynamics with gazebo_yarp_robotinterface this was happening all the time. I am wondering if this was ever happening with yarprobotinterface. A question for people that used the robot recently (I guess @S-Dafarra @GiulioRomualdi @isorrentino @gabrielenava @Giulero but I am just guessing, feel free to tag other people): where you able to close with a simple Ctrl+C the yarprobotinterface that contained the wholebodydynamics device, or it was a necessary multiple Ctrl+C ?

Indeed sometimes we had to kill the yarprobotinterface. Most of the times we were blaming some joints not reaching the parking position

traversaro commented 2 years ago

Indeed sometimes we had to kill the yarprobotinterface. Most of the times we were blaming some joints not reaching the parking position

Interesting! Let's see if this PR fixes that behaviour.

gabrielenava commented 2 years ago

A question for people that used the robot recently (I guess @S-Dafarra @GiulioRomualdi @isorrentino @gabrielenava @Giulero but I am just guessing, feel free to tag other people): where you able to close with a simple Ctrl+C the yarprobotinterface that contained the wholebodydynamics device, or it was a necessary multiple Ctrl+C ?

we almost always run the yarprobotinterface through yarpmanager, sometimes it happened that we had to kill it but usually it stopped with the stop button. I don't know if in this case the stopping procedure is more elaborate than ctrl+c and resolves the problem without further intervention.

tagging @HosameldinMohamed to confirm. (EDIT: which is already reviewer of the PR so I think he received the notification already :-))

traversaro commented 2 years ago

A question for people that used the robot recently (I guess @S-Dafarra @GiulioRomualdi @isorrentino @gabrielenava @Giulero but I am just guessing, feel free to tag other people): where you able to close with a simple Ctrl+C the yarprobotinterface that contained the wholebodydynamics device, or it was a necessary multiple Ctrl+C ?

we almost always run the yarprobotinterface through yarpmanager, sometimes it happened that we had to kill it but usually it stopped with the stop button. I don't know if in this case the stopping procedure is more elaborate than ctrl+c and resolves the problem without further intervention.

If I am not wrong, the stop command should correspond to the single Ctrl+C.

traversaro commented 2 years ago

According to @Nicogene he is not experiencing any deadlock even without this PR. So probably there was some particular bug fixed in https://github.com/robotology/gazebo-yarp-plugins/pull/618 that highlighted this deadlock. In any case, I think it is something worth fixing.

HosameldinMohamed commented 2 years ago

tagging @HosameldinMohamed to confirm. (EDIT: which is already reviewer of the PR so I think he received the notification already :-))

I don't think I faced this issue with the wholeBodyDynamics instances we run, either with Gazebo or with the real robot.

traversaro commented 2 years ago

Thanks!