open-rmf / rmf_ros2

Internal ROS infrastructure for RMF
Apache License 2.0
74 stars 60 forks source link

Add a timeout before automatically releasing lift #369

Closed mxgrey closed 3 months ago

mxgrey commented 5 months ago

It's been discovered that a race condition can cause a deadlock when lift session usage is combined with lane mutexes.

If a robot locks the lane mutexes for the lift and begins a lift session but then a replan occurs before the robot enters the lift, there is a narrow window where the robot might automatically drop the lift session. If another robot is also summoning the lift simultaneously then that other robot could manage to take over the lift session. However the first robot will still be holding the lane mutexes. At that point, one of the robots will be waiting to lock the lift session while the other will be waiting to lock the lane mutex.

The automatic dropping of the lift session happens because of a blunt force mechanism that tries to identify when a robot is holding onto a lift session without really needing it. When a replan occurs while the robot is outside of the lift, it creates a very narrow window where that blunt force mechanism will pick up a false positive and trigger the release. This PR attempts to soften that mechanism by requiring a 30 second window to pass before doing the release automatically. This ensures that a situation where a quick replan occurs will not trigger the mechanism.

Remaining Issues / Rationale:

This case will be taken into serious consideration as we work on the next generation traffic + resource locking mechanisms.

cwrx777 commented 3 months ago

is this PR good to merge to main?

mxgrey commented 3 months ago

Thanks for the reminder @cwrx777