osrf / rmf_core

Provides the centralized functions of RMF: scheduling, etc.
Apache License 2.0
102 stars 41 forks source link

Fleet keeps going back-and-forth / deadlock during replanning and negotiation #145

Closed Briancbn closed 3 years ago

Briancbn commented 4 years ago

This might actually indirectly solve #144, but maybe a different problem in nature.

We notice that running rmf_demos:office_loop_conflict.launch.xml, there are two major problem when resolving conflict

Test was done in simulation. This happens in the world we set up as well. Is this an expected behaviour out of fleet adapter? or is there any way to avoid this problem. If this problem is known, what is the plan currently in addressing this.

mxgrey commented 4 years ago

Whenever reporting a behavioral issue, please try to provide video recordings of what's happening. Verbal descriptions are not likely to be sufficient to identify the problem.

Are the deadlocks happening around doors or lifts? If so, there are some known issues with negotiating through the use of such infrastructure, and we're already working on tying up those loose threads. Trying to negotiate through doorways is the most common cause of the back-and-forth behavior that you're describing.

I'm surprised to hear that head-to-head conflicts along a non-door lane would result in a deadlock, but possible cause I can think of is if one of the vehicles has a loop request goal waypoint in the middle of the deadlocked zone. The loop request is a half-baked task request that was only meant for demo purposes, so there are some rules for using it safely, like the endpoints should only be leaf nodes on the navigation graph.

mxgrey commented 4 years ago

One other possible cause of a head-to-head deadlock is an inconsistency between the vicinity parameters of the fleet adapter and the parameters of the simulated robot. Unfortunately the current pipeline has two sources of truth for that. I'll double-check those parameters as soon as I get the chance.

Briancbn commented 4 years ago

Thanks @mxgrey for the prompt reply

Whenever reporting a behavioral issue, please try to provide video recordings of what's happening. Verbal descriptions are not likely to be sufficient to identify the problem.

Are the deadlocks happening around doors or lifts? If so, there are some known issues with negotiating through the use of such infrastructure, and we're already working on tying up those loose threads. Trying to negotiate through doorways is the most common cause of the back-and-forth behavior that you're describing.

As is shown in the first video, back and forth happens even without door involved. However, we also notice this behavior when negotiating through a door and this will lead to significant memory increases mentioned in #129.

One other possible cause of a head-to-head deadlock is an inconsistency between the vicinity parameters of the fleet adapter and the parameters of the simulated robot.

This will occur for MiR100. We have noticed this, but for Magni in the log collision between robots are not reported by gzserver first, which means fleet adapter should be detecting collision first.

so there are some rules for using it safely, like the endpoints should only be leaf nodes on the navigation graph.

Does leaf nodes means the end point of the navigation graph?

mxgrey commented 4 years ago

Thanks for the videos, they're very insightful.

Keep going back and forth

It looks to me like there are two problems happening here at the same time:

  1. The simulated magni2 is not doing a very good job of keeping up with its scheduled plan. This implies that some simulation parameters may need to be tweaked. Perhaps friction or motor parameters need to be improved so that the simulated robot can keep on track with its commanded itinerary better. Or if that doesn't help then maybe we need to adjust the expected velocity and acceleration for the robot so that we get a better prediction of what it can do.

  2. The fleet adapter is not doing a great job of recognizing where the robot is on its navigation graph. This is related to the API problem that I mentioned in #146.

Solving either of these problems should eliminate the backtracking behavior. Problem (2) should be addressed by some ongoing work to update the simulation to use the newer fleet adapter API.

Dead lock

Is M4 a start or end waypoint in a loop request for magni2? It looks to me like magni2 is reporting itself as briefly not existing on the schedule while it sits at M4. That's not surprising if M4 is a start or end waypoint in a loop request, because the loop request implementation was not designed to be robust to this use case.

Effectively what's happening here is magni_2 is reporting to the traffic schedule that it won't exist in physical space for some number of seconds. This causes magni_1 to think it's free to move through the M4 waypoint, even though in real life the magni_2 is blocking it. This kind of non-existence is only supposed to happen when a robot is finished with its task and parked in a designated parking spot. Unfortunately the "loop" task request is not very well designed (it was only made for demo purposes), so there will be a brief period of "non-existence" each time the robot reaches a start or end waypoint.

If M4 is not a start or end waypoint, then I'll need to investigate this further. If you're able to share some launch files that reliably recreate this condition, then I'll be happy to use it as a test case in debugging.

Does leaf nodes means the end point of the navigation graph?

The way I would define a leaf node is a waypoint that only connects to one other waypoint. If you always use leaf nodes as the start and end waypoints of a loop request, then deadlocks shouldn't happen. If you find that deadlocks do happen despite diligently using only leaf nodes as start and end waypoints, then I'll need to investigate that further.

mxgrey commented 3 years ago

I'm closing this issue for a lack of recent activity. Feel free to reopen if there is anything more to report.