two robots wait for each other for about 2.5 minutes when passing through one intersection

ghost commented 1 year ago

Bug report

Required information:

Required information: Operating system and version: Ubuntu 22.04 OpenRMF installation type: from source OpenRMF version or commit hash Main baranch: fc1f9983b8634b839aeddf686d97c21e9e50beb1 ROS distribution and version: Humble ROS installation type: binary Package or library, if applicable:

Description of the bug

I think this is one bug in traffic planner. Both robots need to pass through one same intersection to get to the destination respectively. Randomly, both of them wait at the waypoint near that intersection, it seems each robot is waiting for another robot to pass through firstly. And about 2.5 minutes later, one robot is commanded to pass through firstly, the conflict is resolved.

Steps to reproduce the bug

About the test scenario and logs ,please refer to the following link. https://gist.github.com/stella-ccyydy/5e8401c1a460f2f51a15d54aa07a1ed2

Expected behavior

What I expect is, RMF should determine which robot goes first immediately.

ghost commented 1 year ago

@mxgrey @Yadunund Hi, could you please investigate this issue? For any more information ,please let me know. Thanks Stella

mxgrey commented 1 year ago

Are you able to share the building.yaml file with us so we can run the scenario locally? Also we'll need to know the robot parameters, such as footprint, vicinity, and velocity.

ghost commented 1 year ago

Are you able to share the building.yaml file with us so we can run the scenario locally? Also we'll need to know the robot parameters, such as footprint, vicinity, and velocity.

test.building.txt

limits: linear: [0.7, 0.75] # velocity, acceleration angular: [0.3, 2.0] # velocity, acceleration profile: # Robot profile is modelled as a circle footprint: 0.3 # radius in m vicinity: 0.6 # radius in m reversible: False # whether robots in this fleet can reverse

And please note, this issue happened randomly.

ghost commented 1 year ago

Recently, the same issue happened again for serveral times. Could you please explain more in what kind of case one robot will choose to wait for another robot to pass through firstly during negotiation phase?

Now the problem is the two robots both choose to wait at the begining, and about 2.5 minutes later, they suddently decide that one robot goes firstly, then another robot goes secondly. Is this related with some a timer timeout?

mxgrey commented 1 year ago

Sorry for the delay, I've recently returned from some PTO so I'm able to investigate this now.

I've built a simulation off of the building file and robot parameters that you've provided. After about a dozen runs I never saw the issue that you described.

Are you able to reproduce this issue in a simulation and provide us a package with a launch file that reproduces it, or is the issue only happening with physical hardware and/or a custom fleet adapter in the loop?

From looking at the scenario it doesn't appear to be one of the edge cases where negotiation is known to have issues. I don't think it's very likely to be a planning or negotiation problem unless there are details you've left out, like there are other robots present.

One possible explanation if you're seeing this happen with a custom fleet adapter is a race condition in the command handle. If you're not guarding your follow_new_path implementation with mutexes see this example then it's possible that your fleet adapter's path command is getting jumbled when a sudden negotiation happens, and the result is the robots sit around waiting until the go_to_place activity decides that your integration code has been silent for too long, at which point it will resend the command.

If you're able to get a screen capture (a video recording would be even better) of RViz with the schedule visualization (yellow dots, purple dots, and green lines) while the issue is happening, that might give us some more clues about what might be going on. ROS2 Log output from the fleet adapter would also be good to see.

ghost commented 1 year ago

Are you able to reproduce this issue in a simulation and provide us a package with a launch file that reproduces it, or is the issue only happening with physical hardware and/or a custom fleet adapter in the loop?

Currently the issue happens with my custom fleet adapter. In fact I do not do many tests using rmf_demos so I am not sure whether this issue will happen in rmf_demos.

One possible explanation if you're seeing this happen with a custom fleet adapter is a race condition in the command handle. If you're not guarding your follow_new_path implementation with mutexes see this example then it's possible that your fleet adapter's path command is getting jumbled when a sudden negotiation happens, and the result is the robots sit around waiting until the go_to_place activity decides that your integration code has been silent for too long, at which point it will resend the command.

I am sure I guard follow_new_path implementation with mutexes.

If you're able to get a screen capture (a video recording would be even better) of RViz with the schedule visualization (yellow dots, purple dots, and green lines) while the issue is happening, that might give us some more clues about what might be going on. ROS2 Log output from the fleet adapter would also be good to see.

ok, I will try to reproduce this issue and provide the materials you need ASAP.

ghost commented 1 year ago

Hi @mxgrey

Sorry for late response. Recently I was tied up with some other tricky issues. Now I have time to summarize the test result.

https://github.com/open-rmf/rmf/assets/118786311/a978e346-f4cc-4aa2-a4be-4eea3edefc1b

The time frame for the test is from 2023-06-09 17:26:52 - 2023-06-09 17:36:45 The tasks for three robots are: GL1: from P4 to P6 GL2: from P8 to P6 GL3: from P16 to P6

At 2023-06-09 17:26:52, submit task for robot GL1, GL2 and GL3 almost at the same time. From the video ,we can see, at 00:00:03, GL1, GL2, GL3 were all given path to P6 at 00:00:11 GL3 arrived at P6 first, and began to perform action (invoking self.update_handle.unstable_declare_holding) When GL3 performs action at P6, my adapter will take action to avoid GL1 and GL2 to go to P6. That is, my adapter will stop GL1 and GL2. So GL1 will wait at P5, and GL2 will wait at P7.

At 2023-06-09 17:27:07 submit a task to GL3 to request it leave P6 and go to P16 From the video, we can see at 00:00:26, GL3 arrived at P16, that is to say it released waypoint P6. From 00:00:26, new path was received from RMF core for GL1 and GL2 continuously. But the new path did not navigate the GL1 or GL2 to P6, but requested them both to stand still.

The rmf_adapter log: python3_8574_1686302215757.log You can focus on the log after line 96 The highlighted log indicates receiving tasks from GL1, GL2, GL3 (all robots to P6)

In the below picture, the highlighted log indicates that receiving task from GL3 to go from P6 to P16

Hope this is helpful for you to investigate this issue. If you need any more information, please feel free to let me know. Thanks Stella

ghost commented 1 year ago

@mxgrey Please find the rmf traffic schedule log in below link, for the above test. Hope it helps. rmf_traffic_schedule_8548_1686302215327.log

mxgrey commented 1 year ago

Okay an important detail here that I didn't catch in your original description is that the robots all have the same destination.

When you said

Both robots need to pass through one same intersection

I thought you meant that they need to negotiate their way across an intersection and they each have their own destinations beyond that intersection. Instead they all have the same destination at the center of the intersection.

There's a discussion here that talks about the problems with sending multiple robots to the same destination at the same time. We're developing a reservation system and a queuing system to sort those situations out automatically, but in the meantime if you're using direct robot task requests then I suggest making sure you don't issue multiple tasks with the same destination at the same time.

ghost commented 1 year ago

I thought you meant that they need to negotiate their way across an intersection and they each have their own destinations beyond that intersection. Instead they all have the same destination at the center of the intersection.

I'm pretty sure that when I first met this issue, it did happen in the scenario you described above. That is ,the two robots have different destination, and they need to negotiate their way across an interseciton. And actually, the issue is not easily reproduced in the above scenario. So I designed the test case of multiple robots having same destinations, in order that I can easily reproduce this issue.

mxgrey commented 1 year ago

I sympathize with the difficulty of reproducing non-deterministic issues, but the scenario that you've reproduced here will not be supported until the reservation system has been incorporated into RMF. I can't use this information to debug possible behavioral issues that robots might have when they're passing each other at intersections because that would have different causes than what we'll see in this scenario.

I recently updated the roadmap for Open-RMF. Some of the upcoming features will include simpler APIs and debugging+logging tools to help recreate whatever situations occur during testing. I think those upcoming improvements will solve the type of problem that you're describing, or at a minimum allow you to more easily share the scenarios with us so we can reproduce and debug them on our end.

open-rmf / rmf