Open zbynekwinkler opened 4 years ago
We see a similar thing happening in 46126b6f-1b8f-48a2-95a0-81e6e65db59b
- only the working and non-working robot have switched. I haven't done such a thorough analysis this time but I expect to find the same problem.
This is just a status update. We are looking at the logs.
Any progress on this? The deadline for cave is rapidly approaching (I bet you already knew that :wink:).
We are still trying to reproduce this issue.
This happened to me on run 1e917331-cf3d-474b-a730-9d507ed787a0
one of my robots has definitely triggered breadcrumb deployments (X1
) but it seems like the deployments did not work.
In simulation e097b600-5ff5-4ce0-a687-8079f46d0815, this happened to our robot X3. Topic statistics show that the deploy commands were issued, but no breadcrumb from X3 appeared.
But it doesn't happen often, this is actually the first time we noticed.
X3? Is not X3 drone (without breadcrumbs)?
Sorry for the confusion, we name ground robots as X1-X3, although all of them are EXPLORER_X1 :-D
Ignition topic statistics have been enabled in the release on Feb. 12.
You will now see the following in the Ignition Gazebo logs, assuming the robot's name is X1
:
On successful transmission: Received breadcrumb deployment for X1__breadcrumb__
If a message was dropped: Dropped message count of N for breadcrumbs on model X1__breadcrumb__
It has to get lost in the ROS-IGN bridge. Have a look at simulation 4bb3e674-e97e-4b11-a17e-295c6d95f020 .
Robot X3 is reporting in our ROS log that it has sent a deploy command at time 191. Topic statistics of the deploy command tell that there was exactly 1 message passed and 0 lost on the deploy topic in the relevant time frame. But no breadcrumb was dropped in reality and the server log contains neither the confirmation message nor the dropped message warning for this particular time. I see the confirmation messages for other breadcrumbs dropped later (which means the sim was already running on the updated dockers), but at time 191, the simulator didn't get a request to drop a breadcrumb.
My conclusion is that the ROS part of the ROS-IGN bridge receives the message but doesn't succeed calling the Ignition service.
Would it be possible to route the breadcrumbs/remaining
Ign topic to the solution containers as a ROS topic? This way, we could check if the number of remaining breadcrumbs has decreased after we instructed a drop. With some hysteresis, it should be possible to write a controller that stops the robot and retries deploying until it succeeds.
Would it be possible to route the
breadcrumbs/remaining
Ign topic to the solution containers as a ROS topic?
Either that or the breadcrumbs just should be a ROS service and this "making sure it does the thing"-stuff should be handled inside it.
On a related note: zeromq pub-sub sockets as-is are not a reliable transfer either - a lot of work needs to be done to make it so, see fro example https://zguide.zeromq.org/docs/chapter5/#Reliable-Pub-Sub-Clone-Pattern, so some kind of confirmation & repetition is going to be needed anyway.
Checking in to see if anyone has seen this issue recently?
Checking in to see if anyone has seen this issue recently?
just side note, that the breadcrumbs are sometimes not visualized in web view so it is necessary to reload or download logs to check.
Checking in to see if anyone has seen this issue recently?
OK, now I have fresh run (robotika, ver104sym
, 4115228d-3dac-4c56-a98c-7c27ae383857
). Two robots A900L
and B900R
, where both should deploy breadcrumbs but only A did:
A900L__breadcrumb___0: -5.50 29.20 -0.00
A900L__breadcrumb___1: 24.15 29.93 -0.00
A900L__breadcrumb___2: 54.28 31.26 -0.00
A900L__breadcrumb___3: 84.15 28.90 -0.00
A900L__breadcrumb___4: 52.50 -19.60 5.00
A900L__breadcrumb___5: 82.49 -18.48 5.00
and B reported "fake" breadcrumb position, so there are holes in communication. Note, that check if we deployed breadcrumb is not implemented on our side, but never the less it would not help as it failed 6 times, right?
(breadcrumbs are the pink triangles, expected spacing 30 meters, white and green are the robot trajectories)
@nkoenig - which file I should use for debugging? I see rostopic_stats_logger.log
where for both robots have 4 records with keyword "breadcrumb":
---
topic: "/A900L/breadcrumb/deploy"
node_pub: "/cloudsim2osgar"
node_sub: "/A900L/ros_ign_bridge_breadcrumbs"
window_start:
secs: 132
nsecs: 340000000
window_stop:
secs: 178
nsecs: 240000000
delivered_msgs: 1
dropped_msgs: 0
traffic: 0
period_mean:
secs: 0
nsecs: 0
period_stddev:
secs: 0
nsecs: 0
period_max:
secs: 0
nsecs: 0
stamp_age_mean:
secs: 0
nsecs: 0
stamp_age_stddev:
secs: 0
nsecs: 0
stamp_age_max:
secs: 0
nsecs: 0
---
for B:
---
topic: "/B900R/breadcrumb/deploy"
node_pub: "/cloudsim2osgar"
node_sub: "/B900R/ros_ign_bridge_breadcrumbs"
window_start:
secs: 126
nsecs: 968000000
window_stop:
secs: 158
nsecs: 808000000
delivered_msgs: 1
dropped_msgs: 0
traffic: 0
period_mean:
secs: 0
nsecs: 0
period_stddev:
secs: 0
nsecs: 0
period_max:
secs: 0
nsecs: 0
stamp_age_mean:
secs: 0
nsecs: 0
stamp_age_stddev:
secs: 0
nsecs: 0
stamp_age_max:
secs: 0
nsecs: 0
---
p.s. in server_console.log
I can see:
(2021-04-13T05:20:38.433555050) [Msg] Breadcrumbs subscribing to deploy messages on [/model/dynamic_rocks_1/breadcrumbs/Rock/deploy]
...
(2021-04-13T05:37:49.980333671) [Msg] Received breadcrumb deployment for A900L__breadcrumb__
(2021-04-13T05:37:50.395836353) [Msg] Deploying A900L__breadcrumb___0 at -16.003 29.1998 0.1275 -0 0 1.81078
...
(2021-04-13T05:38:23.429429494) [Msg] Breadcrumb 'A900L__breadcrumb___0' is now static.
...
where the first message sounds strange but the functionality was probably reused for falling rocks too?? Note, that there is no record about deployment of B900R breadcrumbs, except
(2021-04-13T05:20:41.609652963) [Msg] Breadcrumbs subscribing to deploy messages on [/model/B900R/breadcrumb/deploy]
Another robotika run from yesterday, Tunnel Circuit 4, e13697f7-ed1c-4727-b5dc-a9204d8f80d2
- two Freyja robots. It scored only thanks to the fact that both robots returned to the base. state.tlog
does not contain any record and our Teambase contains two breadcrumbs placement and then the connection was lost.
0:03:32.608000 22 [30.000570853814416, -0.3531142521161632, 0.12801069578431462]
0:05:57.360914 22 [60.03632767774239, -0.3953938122968248, 0.12796653574219774]
What is the plan with this bug? (unfortunately this way I was not able to check if new radio signal worked better)
Additional logging has been added to the bridge. Please keep the missing breadcrumb notifications coming.
Additional logging has been added to the bridge. Please keep the missing breadcrumb notifications coming.
It will help a lot if you can also let us know the time where the breadcrumb deployment was requested.
@caguero - do you mean with this new logger not to dig it out from old logfiles, right?
@caguero - do you mean with this new logger not to dig it out from old logfiles, right?
Right, exactly as you did but hopefully we'll get more information to figure out where the problem could be.
Carlos, you've just told me about IGN_TRANSPORT_RCVHWM
and IGN_TRANSPORT_SNDHWM
elsewhere. Could this be used here to help? Why not setting them to infinity for SubT?
Carlos, you've just told me about
IGN_TRANSPORT_RCVHWM
andIGN_TRANSPORT_SNDHWM
elsewhere. Could this be used here to help? Why not setting them to infinity for SubT?
Although tempting, if the buffers are set to infinity, that means that the entire application can crash because we exhausted the memory. Unless it's a very controlled and bounded application I can't recommend settings the buffers to infinite.
Unless it's a very controlled and bounded application
Isn't SubT such? You know the maximum number of robots, you know more or less how many sensors there can be and which topics will be present... And you know the specs of the S3 virtual machine that's running the simulation. It might be interesting to check the memory consumption of the sim container in a running simulation. We usually fit in a few GBs.
FYI 0d0656a7-971b-4bae-8cc2-fa0ab14a64f9
fails to deploy breadcrumbs ... again. Please let me know what to check once UC4 run finishes. Thanks m.
@nkoenig - regarding now terminated 0d0656a7-971b-4bae-8cc2-fa0ab14a64f9
run, in particular ver115uc4r1-0d0656a7-971b-4bae-8cc2-fa0ab14a64f9-C1C2700R.tar.gz
file:
(osgar) md@md-ThinkPad-P50:~/git/osgar$ python -m osgar.logger /data/aws/urban/UC4/ver115/C1C2700R.log | grep bread
31 radio.breadcrumb 168 | 6 | 0.0Hz
48 breadcrumbs.deploy 6 | 6 | 0.0Hz
49 breadcrumbs.location 168 | 6 | 0.0Hz
C1C2700R-ros_ign_bridge_breadcrumbs-33-stdout.log
containing:
[0m[ INFO] [1627545674.570666166, 82.380000000]: Passing message from ROS std_msgs/Empty to Ignition ignition.msgs.Empty (showing msg only once per typeESC[0m
and in ~12GB large log file is
---
topic: "/C1C2700R/breadcrumb/deploy"
node_pub: "/cloudsim2osgar"
node_sub: "/C1C2700R/ros_ign_bridge_breadcrumbs"
window_start:
secs: 107
nsecs: 220000000
window_stop:
secs: 130
nsecs: 204000000
delivered_msgs: 1
dropped_msgs: 0
traffic: 0
period_mean:
secs: 0
nsecs: 0
period_stddev:
secs: 0
nsecs: 0
period_max:
secs: 0
nsecs: 0
stamp_age_mean:
secs: 0
nsecs: 0
stamp_age_stddev:
secs: 0
nsecs: 0
stamp_age_max:
secs: 0
nsecs: 0
---
thanks m.
In run
cb38d467-8aa8-483b-99ce-9e537a6e20fb
we have two robots deploying breadcrumbs - the names of the robots areC600L
andD40W600L
. According to our data each robot should have deployed 6 breadcrumbs, however only robot D40W600L deployed them (both robots run identical code). I was able to confirm that only D40W600L breadcrumbs are being deployed by inspectingstate.tlog
file. While trying to pinpoint the problem I was able to find all 6 lines containingreceiving: deploy []
inrosout.log
for C600L which are being printed exactly before the ROS Empty message is published to the/C600L/breadcrumb/deploy
topic. Further I can see in logfileC600L-ros_ign_bridge_breadcrumbs-33-stdout.log
that this bridge has received the message at least once. Then in logfilerostopic_stats_logger.log
I see 4 times the entry for/C600L/breadcrumb/deploy
so at least 4 breadcumbs deploy commands have been received by/C600L/ros_ign_bridge_breadcrumbs
.To summarize: We published 6 times on the topic
/C600L/breadcrumb/deploy
, 4 times it has been received by/C600L/ros_ign_bridge_breadcrumbs
but no breadcrumbs have been deployed.