osrf / subt

This repostory contains software for the virtual track of the DARPA SubT Challenge. Within this repository you will find Gazebo simulation assets, ROS interfaces, support scripts and plugins, and documentation needed to compete in the SubT Virtual Challenge.
Other
305 stars 98 forks source link

Breadcrumbs not being deployed #655

Open zwn opened 3 years ago

zwn commented 3 years ago

In run cb38d467-8aa8-483b-99ce-9e537a6e20fb we have two robots deploying breadcrumbs - the names of the robots are C600L and D40W600L. According to our data each robot should have deployed 6 breadcrumbs, however only robot D40W600L deployed them (both robots run identical code). I was able to confirm that only D40W600L breadcrumbs are being deployed by inspecting state.tlog file. While trying to pinpoint the problem I was able to find all 6 lines containing receiving: deploy [] in rosout.log for C600L which are being printed exactly before the ROS Empty message is published to the /C600L/breadcrumb/deploy topic. Further I can see in logfile C600L-ros_ign_bridge_breadcrumbs-33-stdout.log that this bridge has received the message at least once. Then in logfile rostopic_stats_logger.log I see 4 times the entry for /C600L/breadcrumb/deploy so at least 4 breadcumbs deploy commands have been received by /C600L/ros_ign_bridge_breadcrumbs.

To summarize: We published 6 times on the topic /C600L/breadcrumb/deploy, 4 times it has been received by /C600L/ros_ign_bridge_breadcrumbs but no breadcrumbs have been deployed.

zwn commented 3 years ago

We see a similar thing happening in 46126b6f-1b8f-48a2-95a0-81e6e65db59b - only the working and non-working robot have switched. I haven't done such a thorough analysis this time but I expect to find the same problem.

nkoenig commented 3 years ago

This is just a status update. We are looking at the logs.

zwn commented 3 years ago

Any progress on this? The deadline for cave is rapidly approaching (I bet you already knew that :wink:).

nkoenig commented 3 years ago

We are still trying to reproduce this issue.

arjo129 commented 3 years ago

This happened to me on run 1e917331-cf3d-474b-a730-9d507ed787a0 one of my robots has definitely triggered breadcrumb deployments (X1) but it seems like the deployments did not work.

peci1 commented 3 years ago

In simulation e097b600-5ff5-4ce0-a687-8079f46d0815, this happened to our robot X3. Topic statistics show that the deploy commands were issued, but no breadcrumb from X3 appeared.

peci1 commented 3 years ago

But it doesn't happen often, this is actually the first time we noticed.

m3d commented 3 years ago

X3? Is not X3 drone (without breadcrumbs)?

peci1 commented 3 years ago

Sorry for the confusion, we name ground robots as X1-X3, although all of them are EXPLORER_X1 :-D

nkoenig commented 3 years ago

Ignition topic statistics have been enabled in the release on Feb. 12.

You will now see the following in the Ignition Gazebo logs, assuming the robot's name is X1:

On successful transmission: Received breadcrumb deployment for X1__breadcrumb__ If a message was dropped: Dropped message count of N for breadcrumbs on model X1__breadcrumb__

peci1 commented 3 years ago

It has to get lost in the ROS-IGN bridge. Have a look at simulation 4bb3e674-e97e-4b11-a17e-295c6d95f020 .

Robot X3 is reporting in our ROS log that it has sent a deploy command at time 191. Topic statistics of the deploy command tell that there was exactly 1 message passed and 0 lost on the deploy topic in the relevant time frame. But no breadcrumb was dropped in reality and the server log contains neither the confirmation message nor the dropped message warning for this particular time. I see the confirmation messages for other breadcrumbs dropped later (which means the sim was already running on the updated dockers), but at time 191, the simulator didn't get a request to drop a breadcrumb.

My conclusion is that the ROS part of the ROS-IGN bridge receives the message but doesn't succeed calling the Ignition service.

Would it be possible to route the breadcrumbs/remaining Ign topic to the solution containers as a ROS topic? This way, we could check if the number of remaining breadcrumbs has decreased after we instructed a drop. With some hysteresis, it should be possible to write a controller that stops the robot and retries deploying until it succeeds.

zwn commented 3 years ago

Would it be possible to route the breadcrumbs/remaining Ign topic to the solution containers as a ROS topic?

Either that or the breadcrumbs just should be a ROS service and this "making sure it does the thing"-stuff should be handled inside it.

On a related note: zeromq pub-sub sockets as-is are not a reliable transfer either - a lot of work needs to be done to make it so, see fro example https://zguide.zeromq.org/docs/chapter5/#Reliable-Pub-Sub-Clone-Pattern, so some kind of confirmation & repetition is going to be needed anyway.

nkoenig commented 3 years ago

Checking in to see if anyone has seen this issue recently?

m3d commented 3 years ago

Checking in to see if anyone has seen this issue recently?

just side note, that the breadcrumbs are sometimes not visualized in web view so it is necessary to reload or download logs to check.

m3d commented 3 years ago

Checking in to see if anyone has seen this issue recently?

OK, now I have fresh run (robotika, ver104sym, 4115228d-3dac-4c56-a98c-7c27ae383857). Two robots A900L and B900R, where both should deploy breadcrumbs but only A did:

A900L__breadcrumb___0: -5.50 29.20 -0.00
A900L__breadcrumb___1: 24.15 29.93 -0.00
A900L__breadcrumb___2: 54.28 31.26 -0.00
A900L__breadcrumb___3: 84.15 28.90 -0.00
A900L__breadcrumb___4: 52.50 -19.60 5.00
A900L__breadcrumb___5: 82.49 -18.48 5.00

and B reported "fake" breadcrumb position, so there are holes in communication. Note, that check if we deployed breadcrumb is not implemented on our side, but never the less it would not help as it failed 6 times, right?

fq-sym (breadcrumbs are the pink triangles, expected spacing 30 meters, white and green are the robot trajectories)

m3d commented 3 years ago

@nkoenig - which file I should use for debugging? I see rostopic_stats_logger.log where for both robots have 4 records with keyword "breadcrumb":

---
topic: "/A900L/breadcrumb/deploy"
node_pub: "/cloudsim2osgar"
node_sub: "/A900L/ros_ign_bridge_breadcrumbs"
window_start: 
  secs: 132
  nsecs: 340000000
window_stop: 
  secs: 178
  nsecs: 240000000
delivered_msgs: 1
dropped_msgs: 0
traffic: 0
period_mean: 
  secs: 0
  nsecs:         0
period_stddev: 
  secs: 0
  nsecs:         0
period_max: 
  secs: 0
  nsecs:         0
stamp_age_mean: 
  secs: 0
  nsecs:         0
stamp_age_stddev: 
  secs: 0
  nsecs:         0
stamp_age_max: 
  secs: 0
  nsecs:         0
--- 

for B:

---
topic: "/B900R/breadcrumb/deploy"
node_pub: "/cloudsim2osgar"
node_sub: "/B900R/ros_ign_bridge_breadcrumbs"
window_start: 
  secs: 126
  nsecs: 968000000
window_stop: 
  secs: 158
  nsecs: 808000000
delivered_msgs: 1
dropped_msgs: 0
traffic: 0
period_mean: 
  secs: 0
  nsecs:         0
period_stddev: 
  secs: 0
  nsecs:         0
period_max: 
  secs: 0
  nsecs:         0
stamp_age_mean: 
  secs: 0
  nsecs:         0
stamp_age_stddev: 
  secs: 0
  nsecs:         0
stamp_age_max: 
  secs: 0
  nsecs:         0
--- 
m3d commented 3 years ago

p.s. in server_console.log I can see:

(2021-04-13T05:20:38.433555050) [Msg] Breadcrumbs subscribing to deploy messages on [/model/dynamic_rocks_1/breadcrumbs/Rock/deploy] 
...
(2021-04-13T05:37:49.980333671) [Msg] Received breadcrumb deployment for A900L__breadcrumb__
(2021-04-13T05:37:50.395836353) [Msg] Deploying A900L__breadcrumb___0 at -16.003 29.1998 0.1275 -0 0 1.81078 
...
(2021-04-13T05:38:23.429429494) [Msg] Breadcrumb 'A900L__breadcrumb___0' is now static. 
...

where the first message sounds strange but the functionality was probably reused for falling rocks too?? Note, that there is no record about deployment of B900R breadcrumbs, except

(2021-04-13T05:20:41.609652963) [Msg] Breadcrumbs subscribing to deploy messages on [/model/B900R/breadcrumb/deploy]
m3d commented 3 years ago

Another robotika run from yesterday, Tunnel Circuit 4, e13697f7-ed1c-4727-b5dc-a9204d8f80d2 - two Freyja robots. It scored only thanks to the fact that both robots returned to the base. state.tlog does not contain any record and our Teambase contains two breadcrumbs placement and then the connection was lost.

0:03:32.608000 22 [30.000570853814416, -0.3531142521161632, 0.12801069578431462]
0:05:57.360914 22 [60.03632767774239, -0.3953938122968248, 0.12796653574219774]

What is the plan with this bug? (unfortunately this way I was not able to check if new radio signal worked better)

nkoenig commented 3 years ago

Additional logging has been added to the bridge. Please keep the missing breadcrumb notifications coming.

caguero commented 3 years ago

Additional logging has been added to the bridge. Please keep the missing breadcrumb notifications coming.

It will help a lot if you can also let us know the time where the breadcrumb deployment was requested.

m3d commented 3 years ago

@caguero - do you mean with this new logger not to dig it out from old logfiles, right?

caguero commented 3 years ago

@caguero - do you mean with this new logger not to dig it out from old logfiles, right?

Right, exactly as you did but hopefully we'll get more information to figure out where the problem could be.

peci1 commented 3 years ago

Carlos, you've just told me about IGN_TRANSPORT_RCVHWM and IGN_TRANSPORT_SNDHWM elsewhere. Could this be used here to help? Why not setting them to infinity for SubT?

caguero commented 3 years ago

Carlos, you've just told me about IGN_TRANSPORT_RCVHWM and IGN_TRANSPORT_SNDHWM elsewhere. Could this be used here to help? Why not setting them to infinity for SubT?

Although tempting, if the buffers are set to infinity, that means that the entire application can crash because we exhausted the memory. Unless it's a very controlled and bounded application I can't recommend settings the buffers to infinite.

peci1 commented 3 years ago

Unless it's a very controlled and bounded application

Isn't SubT such? You know the maximum number of robots, you know more or less how many sensors there can be and which topics will be present... And you know the specs of the S3 virtual machine that's running the simulation. It might be interesting to check the memory consumption of the sim container in a running simulation. We usually fit in a few GBs.

m3d commented 3 years ago

FYI 0d0656a7-971b-4bae-8cc2-fa0ab14a64f9 fails to deploy breadcrumbs ... again. Please let me know what to check once UC4 run finishes. Thanks m.

m3d commented 3 years ago

@nkoenig - regarding now terminated 0d0656a7-971b-4bae-8cc2-fa0ab14a64f9 run, in particular ver115uc4r1-0d0656a7-971b-4bae-8cc2-fa0ab14a64f9-C1C2700R.tar.gz file:

and in ~12GB large log file is

---
topic: "/C1C2700R/breadcrumb/deploy"
node_pub: "/cloudsim2osgar"
node_sub: "/C1C2700R/ros_ign_bridge_breadcrumbs"
window_start: 
  secs: 107
  nsecs: 220000000
window_stop: 
  secs: 130
  nsecs: 204000000
delivered_msgs: 1
dropped_msgs: 0
traffic: 0
period_mean: 
  secs: 0
  nsecs:         0
period_stddev: 
  secs: 0
  nsecs:         0
period_max: 
  secs: 0
  nsecs:         0
stamp_age_mean: 
  secs: 0
  nsecs:         0
stamp_age_stddev: 
  secs: 0
  nsecs:         0
stamp_age_max: 
  secs: 0
  nsecs:         0
---

thanks m.