osrf / subt

This repostory contains software for the virtual track of the DARPA SubT Challenge. Within this repository you will find Gazebo simulation assets, ROS interfaces, support scripts and plugins, and documentation needed to compete in the SubT Virtual Challenge.
Other
309 stars 97 forks source link

Local/AWS docker simulation does not always start #225

Closed osrf-migration closed 4 years ago

osrf-migration commented 5 years ago

Original report (archived issue) by Martin Dlouhy (Bitbucket: robotikacz).


I originally suspected wrong configuration selection (#217), but I am much more careful now and never the less robot sometimes does not react??

I use 3 docker debug setup: sim, bridge and solution:

gpu@gpu-desktop:~/subt_testbed$ ./run.bash osrf/subt-virtual-testbed:cloudsim_sim_latest cloudsim_sim.ign worldName:=simple_tunnel_02 robotName1:=X2L robotConfig1:=X2_SENSOR_CONFIG_4

gpu@gpu-desktop:~/subt_testbed$ ./run.bash osrf/subt-virtual-testbed:cloudsim_bridge_latest worldName:=simple_tunnel_02 robotName1:=X2L robotConfig1:=X2_SENSOR_CONFIG_4

and 3rd is our docker based on subt_seed example.

On the simulation terminal I see:

[Msg] Scoring has Started

but nothing more although the robot should be already moving.

Bridge reports:

ROS_MASTER_URI=http://localhost:11311

No processes to monitor
shutting down processing monitor...
... shutting down processing monitor complete
process[X2L/ros1_ign_bridge_imu-1]: started with pid [119]
process[tf_world_static-4]: started with pid [122]
[Dbg] [Manager.cc:421] Death of process[52] with name[x2_description].
process[X2L/ros1_ign_bridge_pose-2]: started with pid [153]
process[X2L/ros1_ign_bridge_twist-3]: started with pid [171]
process[X2L/ros1_ign_bridge_camera-4]: started with pid [184]
process[X2L/ros1_ign_image_camera-5]: started with pid [197]
process[X2L/ros1_ign_bridge_gpu_lidar-6]: started with pid [210]
process[X2L/ros1_ign_bridge_odom-7]: started with pid [222]
process[X2L/ros1_ign_bridge_battery_state-8]: started with pid [235]
process[X2L/pose_tf_broadcaster-9]: started with pid [248]

and the solution reports receiving of data:

[ INFO] [1570181442.395531202, 957.a527ae8c41762a101acbf2474382f82acd977df2]: received Imu 3500
[ INFO] [1570181442.395660705, 957.a527ae8c41762a101acbf2474382f82acd977df2]: received Odom 3500
[ INFO] [1570181442.473386446, 957.048000000]: received Scan 1400
[ INFO] [1570181442.473581647, 957.048000000]: received Image 1400
[ INFO] [1570181445.649678831, 959.a527ae8c41762a101acbf2474382f82acd977df2]: received Imu 3600
[ INFO] [1570181445.649816540, 959.a527ae8c41762a101acbf2474382f82acd977df2]: received Odom 3600
0:02:00.069263 (0.0 0.0 0.0) [('acc', 1825), ('image', 730), ('origin', 730), ('pose2d', 1825), ('rot', 1825), ('scan', 365), ('sim_time_sec', 36)]
0:02:00.069263 []
[ INFO] [1570181447.128823277, 959.904000000]: Python3: stdout 0:02:00.069263 (0.0 0.0 0.0) [('acc', 1825), ('image', 730), ('origin', 730), ('pose2d', 1825), ('rot', 1825), ('scan', 365), ('sim_time_sec', 36)]

but the robot is still at (0, 0, 0) even after 2 minutes of “simulation”??

Any hint what I should check? This happened to me alreaday say 5 times? I am in particular “scared” of necessary sleep 30 in example solution “to wait for bridge” … isn’t there also some other “timeout” when the solution has to respond after sim and bridge are started?

p.s. the robot is sending turn command since the beginning:

0:00:02.305059 20 b'cmd_vel 0.000000 -0.785398'
0:00:02.354023 20 b'cmd_vel 0.000000 -0.785398'

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


In working example has the bridge this output:

ROS_MASTER_URI=http://localhost:11311

No processes to monitor
shutting down processing monitor...
process[subt_ros_relay-3]: started with pid [98]
... shutting down processing monitor complete
process[X2L/ros1_ign_bridge_imu-1]: started with pid [121]
[Dbg] [Manager.cc:421] Death of process[52] with name[x2_description].
process[tf_world_static-4]: started with pid [130]
process[X2L/ros1_ign_bridge_pose-2]: started with pid [152]
process[X2L/ros1_ign_bridge_twist-3]: started with pid [171]
process[X2L/ros1_ign_bridge_camera-4]: started with pid [184]
process[X2L/ros1_ign_image_camera-5]: started with pid [197]
process[X2L/ros1_ign_bridge_gpu_lidar-6]: started with pid [210]
process[X2L/ros1_ign_bridge_odom-7]: started with pid [227]
process[X2L/ros1_ign_bridge_battery_state-8]: started with pid [236]
process[X2L/pose_tf_broadcaster-9]: started with pid [249]

… I just re-run all 3 dockers, no other change (I run them in sequence sim, bridge, solution, without too much delay) … here I can see different order or processes, does it matter? Is subt_ros_relay-3 important for something?

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


p.s. looking again into broken run and there is also the “relay”, but sooner:

    ros1_ign_bridge_pose (ros1_ign_bridge/parameter_bridge)
    ros1_ign_bridge_twist (ros1_ign_bridge/parameter_bridge)
    ros1_ign_image_camera (ros1_ign_image/image_bridge)

ROS_MASTER_URI=http://localhost:11311

process[subt_ros_relay-3]: started with pid [97]
started roslaunch server http://gpu-desktop:45041/

SUMMARY
========

PARAMETERS
 * /X2L/robot_description: <?xml version="1....
 * /rosdistro: melodic
 * /rosversion: 1.14.3

NODES

ROS_MASTER_URI=http://localhost:11311

No processes to monitor
shutting down processing monitor...

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


When comparing the two bridge stdout I see a small difference - for some reason there are two masters started in both cases, but in the bad one the ouput is “interlaced”:

auto-starting new master
process[master]: started with pid [68]
master has started, initiating launch
master has started, initiating launch
ROS_MASTER_URI=http://localhost:11311

setting /run_id to fcbcc4b0-e686-11e9-a476-7071bc510133
... logging to /home/developer/.ros/log/fcbcc4b0-e686-11e9-a476-7071bc510133/roslaunch-gpu-desktop-53.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
... logging to /home/developer/.ros/log/fcbcc4b0-e686-11e9-a476-7071bc510133/roslaunch-gpu-desktop-52.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

Done checking log file disk usage. Usage is <1GB.

and in the good case it is not:

auto-starting new master
process[master]: started with pid [68]
master has started, initiating launch
master has started, initiating launch
ROS_MASTER_URI=http://localhost:11311

setting /run_id to e9ba5674-e68c-11e9-9b0c-7071bc510133
... logging to /home/developer/.ros/log/e9ba5674-e68c-11e9-9b0c-7071bc510133/roslaunch-gpu-desktop-53.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

... logging to /home/developer/.ros/log/e9ba5674-e68c-11e9-9b0c-7071bc510133/roslaunch-gpu-desktop-52.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

but this could be just coincidence …

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


I use 3 docker debug setup: sim, bridge and solution

Martin, I’m going to try to reproduce it, but is the above with the seed solution or your solution?

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


It is our docker solution, but the beginning is identical to seed example so it may happen there too. At the moment it is not fatal, I just have to watch it and make sure it really starts. Thanks m.

p.s. you can try to remove the 30s sleep - on AWS it failed (again I am not 100% sure it was related, but when I returned the sleep there it worked again)

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


It just happened again :disappointed:

[ INFO] [1570278211.349145962, 51.680000000]: Python3: stdout 0:01:00.018846 (0.0 0.0 0.0) [('acc', 1666), ('image', 666), ('origin', 670), ('pose2d', 1664), ('rot', 1666), ('scan', 333), ('sim_time_sec', 34)]

i.e. not moving for one minute …

(actually 3 already, but … shortly it is not moving).

The bridge has “normal separated start”:

auto-starting new master
process[master]: started with pid [69]
ROS_MASTER_URI=http://localhost:11311

setting /run_id to bf72d498-e76a-11e9-b646-7071bc510133
master has started, initiating launch
master has started, initiating launch
... logging to /home/developer/.ros/log/bf72d498-e76a-11e9-b646-7071bc510133/roslaunch-gpu-desktop-53.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

... logging to /home/developer/.ros/log/bf72d498-e76a-11e9-b646-7071bc510133/roslaunch-gpu-desktop-54.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

process[rosout-1]: started with pid [84]
started core service [/rosout]
started roslaunch server http://gpu-desktop:43715/

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


p.s. this time the simulation docker proceeded:

[Msg] Publishing laser scans on [world/simple_tunnel_02/model/X0_40_100_400L/lin
k/base_link/sensor/front_laser/scan]
[GUI] [Msg] Loading plugin [ignition-rendering2-ogre2]
[GUI] [Dbg] [RenderUtil.cc:837] Create scene [scene]
[Msg] Scoring has Started
[Dbg] [LinearBatteryPlugin.cc:333] [Battery Plugin] Battery drain: 0 minutes passed.

[Dbg] [LinearBatteryPlugin.cc:333] [Battery Plugin] Battery drain: 1 minutes passed.
[Dbg] [LinearBatteryPlugin.cc:430] Battery: linear_battery
[Dbg] [LinearBatteryPlugin.cc:431] PowerLoads().size(): 1
[Dbg] [LinearBatteryPlugin.cc:433] voltage: 12.6591
[Dbg] [LinearBatteryPlugin.cc:434] state of charge: 0.909999 (q 17.9838)

[Dbg] [LinearBatteryPlugin.cc:333] [Battery Plugin] Battery drain: 2 minutes passed.
[Dbg] [LinearBatteryPlugin.cc:333] [Battery Plugin] Battery drain: 3 minutes passed.
[Dbg] [LinearBatteryPlugin.cc:430] Battery: linear_battery
[Dbg] [LinearBatteryPlugin.cc:431] PowerLoads().size(): 1

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


it just happened again … and then it worked …

osrf-migration commented 5 years ago

Original comment by Sophisticated Engineering (Bitbucket: sopheng).


Since this morning I have much problems starting the osrf/subt-virtual-testbed docker image. It takes up to 10 minutes. Maybe it downloads the model data (fuel…).

Unfortunately I deleted the older version of the image. Is there any chance to again pull an older version of the docker image?

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


Can both of you send me the console output from these two commands?

$ docker images | grep srf/subt-virtual-testbed

$ docker ps -a

$ docker version

$ docker-compose version

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


Can you reproduce it all the time?

How long during the run when it happens?

Is the vehicle inside the tunnel? How many?

Post also the exact vehicles configurations when this occur.

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


It happens only “once a while” and it always happens at the beginning. The vehicle does not move so it is at the start position outside the tunnel. The configuration is X2_SENSOR_CONFIG_4

gpu@gpu-desktop:~$ docker images | grep srf/subt-virtual-testbed
osrf/subt-virtual-testbed                                    subt_solution_latest          bc0376256a85        2 weeks ago         3.36GB
osrf/subt-virtual-testbed                                    cloudsim_sim_latest           dac9410605e6        2 weeks ago         4.55GB
osrf/subt-virtual-testbed                                    cloudsim_bridge_latest        d8c6a32fef70        2 weeks ago         3.41GB
osrf/subt-virtual-testbed                                    latest                        5198acd449d2        3 weeks ago         4.49GB
osrf/subt-virtual-testbed                                    <none>                        66da30f8d158        3 weeks ago         4.49GB
osrf/subt-virtual-testbed                                    <none>                        a28b5c00f6ab        4 weeks ago         4.49GB
osrf/subt-virtual-testbed                                    <none>                        515cf3d05a94        5 weeks ago         4.49GB
osrf/subt-virtual-testbed                                    <none>                        c44241493b84        5 weeks ago         3.21GB
gpu@gpu-desktop:~$ docker ps -a
CONTAINER ID        IMAGE                                              COMMAND                  CREATED             STATUS              PORTS               NAMES
9a8e511ae360        osrf/subt-virtual-testbed:cloudsim_bridge_latest   "./run_bridge.bash w…"   5 minutes ago       Up 5 minutes                            frosty_jang
934268f6fcc5        osrf/subt-virtual-testbed:cloudsim_sim_latest      "./run_sim.bash clou…"   5 minutes ago       Up 5 minutes                            nervous_ritchie
cdb07335caeb        sln_bash                                           "/bin/bash"              2 hours ago         Up 2 hours                              distracted_joliot
gpu@gpu-desktop:~$ docker version
Client:
 Version:           18.09.7
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        2d0083d
 Built:             Thu Jun 27 17:56:23 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.7
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       2d0083d
  Built:            Thu Jun 27 17:23:02 2019
  OS/Arch:          linux/amd64
  Experimental:     false
gpu@gpu-desktop:~$ docker-compose version
docker-compose version 1.24.1, build 4667896b
docker-py version: 3.7.3
CPython version: 3.6.8
OpenSSL version: OpenSSL 1.1.0j  20 Nov 2018

osrf-migration commented 5 years ago

Original comment by Sophisticated Engineering (Bitbucket: sopheng).


Here are the results:

$ docker images | grep srf/subt-virtual-testbed
osrf/subt-virtual-testbed                                                        latest                        4b30c527463d        4 days ago          4.51GB
osrf/subt-virtual-testbed                                                        subt_solution_latest          bc0376256a85        2 weeks ago         3.36GB
osrf/subt-virtual-testbed                                                        cloudsim_sim_latest           dac9410605e6        2 weeks ago         4.55GB
osrf/subt-virtual-testbed                                                        cloudsim_bridge_latest        d8c6a32fef70        2 weeks ago         3.41GB

$ docker ps -a
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS                  PORTS               NAMES
4459bfbd4668        sopheng_solution_devel   "/bin/bash"              6 hours ago         Up 6 hours                                  fervent_solomon
28be2cba9fa7        2f4ca2a5c282             "/bin/sh -c '/bin/ba…"   3 days ago          Exited (1) 3 days ago                       affectionate_bardeen
adf95a8ad95c        2f4ca2a5c282             "/bin/sh -c '/bin/ba…"   3 days ago          Exited (1) 3 days ago                       gallant_euler
8d1260cd9841        4d399cc50169             "/bin/sh -c '/bin/rm…"   6 days ago          Exited (1) 6 days ago                       great_boyd
d91dc7eb0e1d        4d399cc50169             "/bin/sh -c '/bin/rm…"   6 days ago          Exited (1) 6 days ago                       inspiring_kare

$ docker version
Client: Docker Engine - Community
 Version:           19.03.2
 API version:       1.40
 Go version:        go1.12.8
 Git commit:        6a30dfc
 Built:             Thu Aug 29 05:29:11 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.2
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.8
  Git commit:       6a30dfc
  Built:            Thu Aug 29 05:27:45 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

$ docker-compose version
docker-compose version 1.24.1, build 4667896b
docker-py version: 3.7.3
CPython version: 3.6.8
OpenSSL version: OpenSSL 1.1.0j  20 Nov 2018

I could reproduce it. It happend every startup during the last 8 hours. After starting the docker container with 3 Robots (one X2 one X3 and one X4) it took some minutes until the robots were shown in the Subt Simulator.

The last time now it was really quick. After 38 seconds all 3 robots and the tunnel are shown in the Subt Simultator. Thanks, for me it currently works. I will come back if it reoccurs.

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


:slight_smile: I see! I mean I run everything remotely so I do not see any Subt Simulator so the robots are maybe not there yet, very good point! Thank you Sophisticated Engineering :slight_smile:

… so is there a way how to check that the robots are already “visible” over CLI?

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


Martin, are you saying that you use ssh to run your solution on a remote machine?

Are you running with docker-compose or via 3 terminals to launch the sim, the bridge, and the solution?

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


I use ssh to remote machine, but I use 3 dockers sim, bridge, solution, not the docker-compose.

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


This happened just on AWS, Simple Tunnel 01, robotika, 1dfdc090-6495-437e-9e38-72ad9ce01660, robots A0F40F100F400L and B30F40F100F400R.

I was able to extract OSGAR logs from B and I see that robot A never moved. Note, that A started immediately (A0F) while B is supposed to wait 30 (B30F). And for A I see:

0.a527ae8c41762a101acbf2474382f82acd977df2  Node Startup
0.a527ae8c41762a101acbf2474382f82acd977df2 INFO /tf_world_static [/tmp/binarydeb/ros-melodic-tf2-ros-0.6.5/src/static_transform_broadcaster_program.cpp:142(main)] [topics: /rosout, /tf_static] Spinning until killed publishing world to simple_tunnel_01
0.a527ae8c41762a101acbf2474382f82acd977df2 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:204(Controller)] [topics: /rosout] Waiting for /clock, /subt/start, and /subt/pose_from_artifact
137.748000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:212(Controller)] [topics: /rosout] Using robot name[A0F40F100F400L]

137.752000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:247(Update)] [topics: /rosout] Sent start signal.
137.796000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout SubT Challenge Ver2!
137.796000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Waiting for robot_name ...
137.796000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:313(Update)] [topics: /rosout, /A0F40F100F400L/cmd_vel] MD robot pose -6 5 dist=61
137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout 0:00:01.950919 (0.0 0.0 0.0) []
137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout robot_name: A0F40F100F400L
137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Use right wall: False
137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Using times [0, 40, 100, 400]
137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Artifacts before start: []
137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Waiting for origin ...
137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Origin: [-6.0, 5.0, 0.0615] A0F40F100F400L
137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Offset: (-6.0, 5.0, 0.0615)
137.848000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout heading -0.0 angle -39.80557109226519 dist 7.810249675906654
137.948000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: /rosout, /A0F40F100F400L/cmd_vel] received Odom 0
137.948000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:76(imuCallback)] [topics: /rosout, /A0F40F100F400L/cmd_vel] received Imu 0 
137.948000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:85(scanCallback)] [topics: /rosout, /A0F40F100F400L/cmd_vel] received Scan 0
137.948000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:94(imageCallback)] [topics: /rosout, /A0F40F100F400L/cmd_vel] received Image 0
138.a527ae8c41762a101acbf2474382f82acd977df2 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout Image resolution (960, 1280, 3)
138.a527ae8c41762a101acbf2474382f82acd977df2 INFO /A0F40F100F400L/ros1_ign_bridge_twist [/tmp/ros1-ign-bridge-release/include/ros1_ign_bridge/factory.hpp:137(Factory<ROS1_T, IGN_T>::ros1_callback)] [topics: /rosout] Passing message from ROS1 geometry_msgs/Twist to Ignition ignition.msgs.Twist (showing msg only once per type
139.948000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(odomCallback)] [topics: /rosout, /A0F40F100F400L/cmd_vel] received Odom 100
1
...
213.200000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout 0:02:00.015000 (0.0 0.0 0.0) [('acc', 2058), ('image', 823), ('origin', 823), ('pose2d', 2057), ('rot', 2058), ('scan', 412), ('sim_time_sec', 42)]
...
391.200000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /A0F40F100F400L/cmd_vel] Python3: stdout 0:07:00.014022 (0.0 0.0 0.0) [('acc', 1750), ('image', 700), ('origin', 700), ('pose2d', 1750), ('rot', 1750), ('scan', 350), ('sim_time_sec', 35)]
...

simply it never moves.

Could it be related to

138.a527ae8c41762a101acbf2474382f82acd977df2 INFO /A0F40F100F400L/ros1_ign_bridge_twist [/tmp/ros1-ign-bridge-release/include/ros1_ign_bridge/factory.hpp:137(Factory<ROS1_T, IGN_T>::ros1_callback)] [topics: /rosout] Passing message from ROS1 geometry_msgs/Twist to Ignition ignition.msgs.Twist (showing msg only once per type 

?

on the “B” robot it was sooner:

136.564000000 INFO /B30F40F100F400R/ros1_ign_bridge_twist [/tmp/ros1-ign-bridge-release/include/ros1_ign_bridge/factory.hpp:137(Factory<ROS1_T, IGN_T>::ros1_callback)] [topics: /rosout] Passing message from ROS1 geometry_msgs/Twist to Ignition ignition.msgs.Twist (showing msg only once per type

as I said this code is based on subt_seed example

osrf-migration commented 5 years ago

Original comment by Arthur Schang (Bitbucket: Arthur Schang).


Martin, in your velocity controller/issuer is there any sort of logic that will time-out if your autonomy stack does not see your robot respond to velocity commands it is issuing (this is the case for move-base). Would your velocity controller continue sending commands throughout the entire simulation duration irregardless of whether or not the robot had responded to commands in the past? It seems the velocity controller startup is delayed as you have pointed out and that could be an issue. To clarify, robot B moved but robot A did not move?

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


The controller (Python3) is sending commands on timer and the desired speed change is triggered by odometry readings. If the robot does not turn it will permanently send commands to turn (that was the command mentioned at the beginning:

0:00:02.305059 20 b'cmd_vel 0.000000 -0.785398'

I do not have the log from AWS unfortunately as the robot dumps it when the task is over.

On the simulation side, if there is any delay, I suppose that there is some watchdog and the robot stops, but re-starts with new velocity command, right?

osrf-migration commented 5 years ago

Original comment by Arthur Schang (Bitbucket: Arthur Schang).


As long as you’re sending velocity commands to the correct topic, the robot should move. Is there any way your Python script failed to pass a blocking statement before sending commands? Do you have a printout to see when it begins sending commands to the robot?

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


I do not have any printout of sending commands from C++ code … I can add one for debug, or counter how many messages were sent.

My colleague just suggested to join the running solution docker and run rostopic info/echo for velocity commands … that I should see if anybody else is listening. Locally I can get to this state by starting the simulation and bridge sooner then the solution docker … so hopefully I will know more say in 1 hour.

thanks m.

osrf-migration commented 5 years ago

Original comment by Arthur Schang (Bitbucket: Arthur Schang).


It looks like your most recent run ended, sorry I wasn’t able to get on Cloudsim in time to check. Did it work as intended or are you still having the same issue?

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


If it was “Virtual STIX” then the result was the same (robot A did not move, B moved as intended).

osrf-migration commented 5 years ago

Original comment by Arthur Schang (Bitbucket: Arthur Schang).


Wait, did you submit on Virtual STIX earlier on the previous run? The logs you posted above are from simple tunnel 01.

I’ll respond to your email sent to subt-help@googlegroups.com. I should just jump into your instance while it’s running and poke around to see if I can figure out if there’s an issue on the Cloudsim side.

Do vehicles A and B ALWAYS move as expected with a local Cloudsim run? If it does not work locally there is even less chance of it working remotely.

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


I am waiting for “SubT Portal Upgrade” as mentioned in e-mail 2 hours ago. I noticed this problem on AWS for the first time and it was “Virtual STIX”. As soon as AWS is available again, I can re-run it on other world(s). I added in robot namecode extra 10s (simulated) wait even for the first robot … but I may intentionally revert it … also note, that the first number after 1st letter (A, B, C, D, E?) is time how long the robot has to wait before it can start navigation. So if there is A100F*R it is expected that it will wait 100s before it moves. The failing case was A0F*

As usually Murphy would be satisfied … I am currently not able to force it to fail locally … when everything is working fine, then:

developer@gpu-desktop:~/subt_solution$ rostopic info /A0F100R/cmd_vel
Type: geometry_msgs/Twist

Publishers:
 * /subt_solution (http://gpu-desktop:37551/)

Subscribers:
 * /A0F100R/ros1_ign_bridge_twist (http://gpu-desktop:32957/)

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


OK, the very last simulation before upgrade (robotika, ver24p01x4, 5d281e38-5571-4653-9847-3b4838eb59a2, Practice Tunnel 01) failed robot C (C120F100F400F1000L). There were no odometry data and after 35 minutes there were no lidar scans

  1. it is not related to Virtual STIX
  2. it can happen even after several minutes of waiting (here 120s of simulation)
  3. it is probably related to missing odometry data

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


only robot D survived and recorded the other robots … you can see lidar trace of entering robot A and B and C not moving even after 6 minutes from start (real time)

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


Robot C:

447.932000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /C120F100F400F1000L/cmd_vel] Python3: stdout 0:35:00.182576 (0.0 0.0 0.0) [('acc', 575), ('image', 230), ('origin', 230), ('rot', 575), ('scan', 89)] 

459.332000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:403(getSpeedCmd)] [topics: /rosout, /C120F100F400F1000L/cmd_vel] Python3: stdout 0:36:00.194315 (0.0 0.0 0.0) [('acc', 570), ('image', 227), ('origin', 228), ('rot', 570)]

and the lost of scans partially (1 minute) correlates with hard crash of robot D - acc data:

0:36:20.722400 16 [236, 1742, 9837]
0:36:20.738244 16 [-176, 6251, 15514]
0:36:20.909687 16 [148, 3073, 9173]

but maybe it is just “acc jitter” described in #166

osrf-migration commented 4 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


osrf-migration commented 4 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


Closing due to long time lapse, and I think the original issue was resolved.