osrf / subt

This repostory contains software for the virtual track of the DARPA SubT Challenge. Within this repository you will find Gazebo simulation assets, ROS interfaces, support scripts and plugins, and documentation needed to compete in the SubT Virtual Challenge.
Other
309 stars 97 forks source link

Slow AWS initialization? #202

Closed osrf-migration closed 5 years ago

osrf-migration commented 5 years ago

Original report (archived issue) by Martin Dlouhy (Bitbucket: robotikacz).


I am reviewing another failing run and it looks like it takes more then 6 minutes to initialize the solution ROS node:

0.a527ae8c41762a101acbf2474382f82acd977df2 Node Startup
0.a527ae8c41762a101acbf2474382f82acd977df2 INFO /tf_world_static [/tmp/binarydeb/ros-melodic-tf2-ros-0.6.5/src/static_transform_broadcaster_program.cpp:142(main)] [topics: /rosout, /tf_static] Spinning until killed publishing world to simple_tunnel_02
0.a527ae8c41762a101acbf2474382f82acd977df2 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:103(Controller)] [topics: /rosout] Waiting for /clock, /subt/start, and /subt/pose_from_artifact
375.900000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:111(Controller)] [topics: /rosout] Using robot name[X2]

375.908000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/src/subt_seed_node.cc:146(Update)] [topics: /rosout] Sent start signal.

Is it expected?? (robotika, simple_tunnel_02, 76640b06-334f-47eb-8dd9-b09202a14ca8)

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


osrf-migration commented 5 years ago

Original comment by Arthur Schang (Bitbucket: Arthur Schang).


This is about the same as I experience when running on AWS. Your allotted one-hour of simulation time starts whenever you call /subt/start or enter the environment though.

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


OK, thank Arthur - my problem is not “the competition hour” but that our code works fine locally and does not work at all on AWS … so I am looking for the root cause … and this could be another one, that it does not wait long enough. We have two “nodes” - one based on subt_seed example and another Python3 OSGAR code talking to ROS via “ROSProxy” … and I have no indication that the Python3 “node” actually started.

osrf-migration commented 5 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


The simulation container is started first, then the bridge+solution container is started. The whole process of starting a run can take multiple minutes.

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


Is there something specific about the ROS nodes, like do they have to be all listed in launch file or something? I suppose that “Scientific Systems Company” will not share info how they solved it … but even simple Python2 ROS code (subscribe and wait for sensors) is not listed in rosout … any hint is welcome. Thanks.

p.s. or do all nodes have to query “/subt/start”??

osrf-migration commented 5 years ago

Original comment by Arthur Schang (Bitbucket: Arthur Schang).


Martin, can you please provide us the ENTRYPOINT (i.e. subt_seed has ENTRYPOINT ["./run_solution.bash"]) to your docker image and the associated bash scripts and launch file(s) so we can better investigate the issue. Either provide them here or in a email to subt-help@googlegroups.com. I’m 100% I can solve and clarify what the issue in a very quick manner is if I can show you by-example using your files whereas it may take many more back-and-forth exchanges without being able to point out your specific issue.

osrf-migration commented 5 years ago

Original comment by Arthur Schang (Bitbucket: Arthur Schang).


Nonetheless, here is a generic example answer from the subt_helloworld development branch example:

Ensure you have an entrypoint set in your Dockerfile such as ENTRYPOINT ["./run_sim.sh"] from here. Also note that this script needs to be copied into the image whenever you’re building the docker image.

Inside of the script you call as the entrypoint, ensure that you are sourcing your environment appropriately as ~/.bashrc is not sourced when entering in this manner. If you call multiple launch files or nodes inside of this script, ensure you start all but the last one you want to hang with an & to run the predecessor scripts/files/nodes in the background to ensure everything is launched up until your last line that will hang until the end of the run. An example from run_sim.sh is here:

#!/usr/bin/env bash

. /opt/ros/melodic/setup.bash
. ~/subt_ws/install/setup.sh
. ~/workspaces/catkin_build_ws/devel/setup.bash

exec roslaunch helloworld_subt_launch robot.launch name:=alpha other_robot:=bravo perception:=true comms:=false control:=true fake_report:=false

while the launch file ran example is here:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<launch>
  <!-- The name/namespace of this robot -->
  <arg name="name"/>
  <!-- The type of this robot (e.g. x1, x2, etc) -->
  <arg name="type" default="x1"/>
  <!-- Enable mapping subsystem -->
  <arg name="mapping" default="true"/>
  <!-- The mapping mode of this robot (e.g. 2d or 3d) -->
  <arg name="mapping_mode" default="3d"/>
  <!-- Enable control subsystem (wall-following behavior) -->
  <arg name="control" default="true" />
  <arg name="wall_follow" default="false"/>
  <arg name="frontier_exploration" default="true"/>
  <!-- Enable perception (object detection) -->
  <arg name="perception" default="true"/>
  <!-- Enable tensorflow object detector -->
  <arg name="tensorflow_object_detector" default="true"/>
  <!-- Enable Darknet (DNN object classifier) -->
  <arg name="darknet" default="false"/>
  <!-- Enable ground truth perception (logical camera) -->
  <arg name="perception_ground_truth" default="false"/>
  <!-- Enable object detection communication -->
  <arg name="comms" default="true"/>
  <arg name="other_robot"/>

  <!-- You'll want this if your odometry frame on <vehicle_name>/odom is <vehcle_name>->base_link without a prefix -->
  <node pkg="tf2_ros" type="static_transform_publisher" name="static_base" args=" 0 0 0 0 0 0 $(arg name)/base_link base_link"/>

  <group ns="$(arg name)">

    <!-- Clear out IMU bias on startup -->
    <node pkg="rosservice" type="rosservice" name="calib_imu" args="call --wait imu/data/set_accel_bias 'bias: {x: 0.0, y: 0.0, z: 0.0}'"/> 
    <node pkg="rosservice" type="rosservice" name="calib_gyro" args="call --wait imu/data/set_rate_bias 'bias: {x: 0.0, y: 0.0, z: 0.0}'"/> 

    <!-- Mapping setup -->
    <group if="$(arg mapping)">
      <!-- Cartographer setup -->
      <include file="$(find helloworld_subt_launch)/launch/cartographer.launch">
        <arg name="name" value="$(arg name)"/>
        <arg name="mode" value="$(arg mapping_mode)"/>
        <arg name="type" value="$(arg type)"/>
      </include>

      <node pkg="nodelet" type="nodelet" name="pcl_manager" args="manager" output="screen" />

      <!-- Run a passthrough filter to clean self returns -->
      <node pkg="nodelet" type="nodelet" name="passthrough_x" args="load pcl/PassThrough pcl_manager" output="screen">
        <remap from="~input" to="points" />
        <remap from="~output" to="ptx" />
        <rosparam>
          filter_field_name: x
          filter_limit_min: -0.40
          filter_limit_max:  0.40
          filter_limit_negative: True
        </rosparam>
      </node>

      <node pkg="nodelet" type="nodelet" name="passthrough_y" args="load pcl/PassThrough pcl_manager" output="screen">
        <remap from="~input" to="ptx" />
        <remap from="~output" to="points_filtered" />
        <rosparam>
          filter_field_name: y
          filter_limit_min: -0.40
          filter_limit_max:  0.40
          filter_limit_negative: True
        </rosparam>
      </node>

      <!-- Octomap setup -->
      <node pkg="octomap_server" type="octomap_server_node" name="octomap_server">
        <param name="resolution" value="0.1" />
        <!-- fixed map frame (set to 'map' if SLAM or localization running) -->
        <param name="frame_id" type="string" value="map" />
        <!-- maximum range to integrate (speedup!) -->
        <param name="sensor_model/max_range" value="10.0" />
        <!-- data source to integrate (PointCloud2) -->
        <remap from="cloud_in" to="points_filtered"/>
        <param name="filter_ground" value="true"/>
        <param name="ground_filter/distance" value="0.12"/>
        <param name="ground_filter/plane_distance" value="0.2"/>
        <param name="base_frame_id" value="$(arg name)/base_link"/>
        <param name="pointcloud_max_z" value="1.8"/>
        <param name="occupancy_min_z" value="0.2"/>
      </node>

    </group>

    <!-- Control setup -->
    <group if="$(arg control)">
      <!-- Extract planar laser scan -->
      <node pkg="pointcloud_to_laserscan" type="pointcloud_to_laserscan_node" name="ptcld_to_lsrscn">
        <param name="~min_height" value="0.1"/>
        <param name="~max_height" value="1.0"/>
        <param name="~range_max" value="10.0"/>
        <remap from="cloud_in" to="points"/>
      </node>

      <!-- Controller -->
      <!-- Wall Follower -->
      <node if="$(arg wall_follow)" pkg="wall_follow" type="wall_follow_node" name="wall_follow" output="screen">
        <param name="name" value="$(arg name)"/>
        <param name="base_link" value="$(arg name)/base_link"/>
        <param name="start_delay" value="0"/>
      </node>

      <!-- Frontier Exploration http://wiki.ros.org/explore_lite -->
      <node if="$(arg frontier_exploration)" pkg="explore_lite" type="explore" respawn="false" name="explore" output="screen">
        <param name="robot_base_frame" value="$(arg name)/base_link"/>
        <param name="costmap_topic" value="move_base/global_costmap/costmap"/>
        <param name="costmap_updates_topic" value="move_base/global_costmap/costmap_updates"/>
        <param name="visualize" value="true"/>
        <param name="planner_frequency" value="0.33"/>
        <param name="progress_timeout" value="30.0"/>
        <param name="potential_scale" value="3.0"/>
        <param name="orientation_scale" value="0.0"/>
        <param name="gain_scale" value="1.0"/>
        <param name="transform_tolerance" value="0.3"/>
        <param name="min_frontier_size" value="0.5"/>
      </node>

      <!-- Required for both wall follow and frontier exploration http://wiki.ros.org/husky_navigation -->
      <node pkg="move_base" type="move_base" respawn="false" name="move_base" output="screen">
        <param name="base_global_planner" value="navfn/NavfnROS"/>
        <param name="base_local_planner" value="dwa_local_planner/DWAPlannerROS"/>
        <rosparam file="$(find helloworld_subt_launch)/config/planner.yaml" command="load" />

        <rosparam file="$(find helloworld_subt_launch)/config/costmap_common_params.yaml" command="load" ns="global_costmap" />
        <rosparam file="$(find helloworld_subt_launch)/config/costmap_common_params.yaml" command="load" ns="local_costmap" />
        <rosparam file="$(find helloworld_subt_launch)/config/local_costmap_params.yaml" command="load" />
        <rosparam file="$(find helloworld_subt_launch)/config/global_costmap_params.yaml" command="load" />

        <remap from="map" to="projected_map"/>
        <!--<rosparam file="$(find helloworld_subt_launch)/config/base_local_planner_params.yaml" command="load" />-->
      </node>
      <!--<node pkg="move_base" type="move_base" respawn="false" name="move_base" output="screen">
        <param name="base_global_planner" value="navfn/NavfnROS"/>
        <param name="base_local_planner" value="dwa_local_planner/DWAPlannerROS"/>
        <rosparam file="$(find helloworld_subt_launch)/config/planner.yaml" command="load" />

        <rosparam file="$(find helloworld_subt_launch)/config/costmap_common.yaml" command="load" ns="global_costmap" />
        <rosparam file="$(find helloworld_subt_launch)/config/costmap_common.yaml" command="load" ns="local_costmap" />

        <rosparam file="$(find helloworld_subt_launch)/config/costmap_local.yaml" command="load" />
        <param name="local_costmap/width" value="10.0"/>
        <param name="local_costmap/height" value="10.0"/>

        <rosparam file="$(find helloworld_subt_launch)/config/costmap_global_static.yaml" command="load" ns="global_costmap" />
        <remap from="map" to="$(arg name)/projected_map"/>
      </node>-->
      <!--<node pkg="move_base" type="move_base" respawn="false" name="move_base" output="screen">
        <rosparam file="$(find helloworld_subt_launch)/config/costmap_common_params.yaml" command="load" ns="global_costmap" />
        <rosparam file="$(find helloworld_subt_launch)/config/costmap_common_params.yaml" command="load" ns="local_costmap" />
        <rosparam file="$(find helloworld_subt_launch)/config/local_costmap_params.yaml" command="load" />
        <rosparam file="$(find helloworld_subt_launch)/config/global_costmap_params.yaml" command="load" />
        <rosparam file="$(find helloworld_subt_launch)/config/base_local_planner_params.yaml" command="load" />
      </node>-->

    </group>

    <!-- Perception setup -->
    <group if="$(arg perception)">
      <!-- Use tensorflow object detector -->
      <node if="$(arg tensorflow_object_detector)" pkg="tensorflow_object_detector" name="detect_ros" type="detect_ros.py" output="screen"> 
        <remap from="image" to="front/image_raw"/>
      </node>

      <!-- Use logical camera if ground truth is on - Depreciated in Ignition - -->
      <node if="$(arg perception_ground_truth)" pkg="gazebo_logical_camera" name="logical_camera_ros" type="logical_camera_object_reporter_node" output="screen">
        <param name="detection_frame_id" value="$(arg name)/base_link"/>
        <param name="report_frame_id" value="$(arg name)/map"/>
        <rosparam param="objects_to_recognize">
          backpack_1: 1
          toolbox_1: 2
          extinguisher_1: 3
          valve_1: 4
          radio_1: 5
          Backpack: 7
        </rosparam>
      </node>

      <!-- Image labeling for all forms of perception -->
      <node pkg="camera_object_projector" type="camera_object_projector_node" name="camera_object_projector" output="screen">
        <remap from="image" to="front/image_raw"/>
        <remap from="camera_info" to="front/camera_info"/>
        <remap from="object_detections" to="objects"/>
      </node> 

    </group>

    <!-- A tf from map to ns/map is needed for each robot (this is not correct but it demonstrates functionality) - Allows map to be dynamic for move base on multiple vehicles -->
    <node pkg="tf2_ros" type="static_transform_publisher" name="map_to_ns_map_stf" args="0 0 0 0 0 0 map $(arg name)/map" />

    <!-- Communication setup -->
    <group if="$(arg comms)">
      <!-- Communication package -->
      <node pkg="comms_example" name="object_detection_comms" type="object_detection_node" args="$(arg name) $(arg name) $(arg other_robot) $(arg other_robot)" output="screen" />
    </group>

  </group>

</launch>

I hope the generic example is helpful for others. Note: this has previously worked for me and there have been no changes to this structure since it was last tested. I did not re-test it before posting this though so there may be small differences/issues present from updates.

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


Arthur, the docker image is based on subt_seed example:

FROM osrf/subt-virtual-testbed:subt_solution_latest

RUN /usr/bin/python3 -m pip install --user --no-cache-dir msgpack pyserial "opencv-python>3.4.2,<4"
RUN git clone https://github.com/robotika/osgar.git osgar \
  && cd osgar \
  && git checkout feature/virtual-ignition

COPY run_solution.bash ./

COPY subt_seed_node.cc ./src/subt_seed/src/subt_seed_node.cc
RUN /bin/bash -c 'source /opt/ros/melodic/setup.bash && catkin_make install'

ENTRYPOINT ["./run_solution.bash"]

and run_solution.bash is currently

#!/usr/bin/env bash

. /opt/ros/melodic/setup.bash
. ~/subt_solution/install/setup.sh

# Wait for the bridge
sleep 30

# Run your solution.
roslaunch subt_seed x1.launch &

sleep 10

cd osgar

python subt/wait_for_sensors.py

python3 -m subt run subt/subt-x2.json --side left --walldist 0.75 --timeout 180 --note "try to visit artifact and return home" 2>&1 | python subt/std2ros.py

For ver5 I added wait_for_sensors.py and to get “stdout” I added also std2ros.py

osrf-migration commented 5 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


The subt_seed is meant to be an example that you can use as a reference. There are numerous hardcoded values and behaviors in the subt_seed. For example, the subt_seed will automatically call /subt/start and you may not be ready to start the run. The subt_seed will also send velocity commands to a robot, and periodic messages through the comms bridge.

At a minimum, remove roslaunch subt_seed x1.launch & line in your run_solution.bash. I would strongly suggest creating your own solution that doesn't depend on the subt_seed.

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


I modified the original example code and I use it in order to report artifacts - I was not able to do that with older code and at the end I gave up and modified working subt_seed (see 2 points in “Practice Tunnel 01”). I also removed all code related to sending commands - now it initializes communication, sends /subt/start and queries the initial robot pose.

Or do you mean that even in the wait_for_sensors.py I should send /subt/start? It is optional so I suppose no?

Here is the hacked subt_seed code: https://github.com/robotika/osgar/blob/feature/virtual-ignition/subt/src/subt_seed_node.cc

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


p.s. wait_for_sensors.py is here https://github.com/robotika/osgar/blob/feature/virtual-ignition/subt/wait_for_sensors.py

osrf-migration commented 5 years ago

Original comment by Hector Escobar (Bitbucket: hector_escobar).


Martin Dlouhy (robotikacz) ,

We use a main.launch file where we call all the required nodes (including python nodes) as shown in the example launch file above. Then our entry point looks like

#!/usr/bin/env bash

. /opt/ros/melodic/setup.bash
. ~/subt_solution/install/setup.sh

# Wait for the bridge
sleep 30

# Run your solution.
exec roslaunch ssci_nodes main.launch

My suggestion would be you call all your python scripts inside the your x1.launch. I am not sure your last line gets called as python subt/wait_for_sensors.py doesn’t run in the background (maybe add &)

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


Hector, thank you :slight_smile: - the wait_for_sensors.py should finish as soon as it gets a message for each topic. I added it there to be sure that it is ready, but it did not help much. I will look into launch files, thank you.

osrf-migration commented 5 years ago

Original comment by Arthur Schang (Bitbucket: Arthur Schang).


It looks like everyone above has posted some great advice. I would remove subt_seed code as soon as you get your own code working as it may interfere (although it sounds like you’ve modified it a good bit). I concur with Hector here, the recommended way to launch a large number of ROS nodes is by using launch files to launch all of your nodes together although having a script such as python subt/wait_for_sensors.py to listen for sensors then exit is a good idea and might remove the need for the sleep. Although, the bridge should be up so I don’t believe you’ll have to add those sleeps in for Cloudsim.

It sounds as if you’re making progress though and you’ve started to get everything working. Please keep pinging back on this issue with additional questions/issues.

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


Arthur, thanks but I do not share your optimism any more :wink: … I would like to drop subt_seed completely, but then I have issue with reporting artifacts (I would almost day “damn artifacts”). I suppose that the communication client should work without ROS - to use zeromq directly is probably another way to hell. One launch file sounds the way to go, but the rest is still unclear to me …

OT our desperate “progress report” is here: https://robotika.cz/competitions/subtchallenge/virtual-tunnel-circuit/en

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


p.s. I was very naive - even the communication broker needs ROS:

developer@gpu-desktop:~/subt_solution/osgar/subt/src$ g++ report_artf.cc -I/home/developer/subt_solution/install/include/
In file included from report_artf.cc:26:0:
/home/developer/subt_solution/install/include/subt_communication_broker/subt_communication_client.h:19:10: fatal error: ros/ros.h: No such file or directory
 #include <ros/ros.h>
          ^~~~~~~~~~~
compilation terminated.

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


p.s.2 Arthur, I decided to go “opposite direction” i.e. to extend something, which is already working (subt_seed_node). I added four subsrcibers (odometry, imu, scan and image) and uncommented publisher cmdVel. As callback functions I added only ROS_INFO what message I received, OK? I just submitted “ver7” to see what AWS will do, but locally I got “strange” outpus

[ INFO] [1569563527.221367007, 160.032000000]: received Imu
[ INFO] [1569563527.221439372, 160.032000000]: received Scan
[ INFO] [1569563527.221498143, 160.032000000]: received Imu
[ INFO] [1569563527.221542285, 160.032000000]: received Imu
[ INFO] [1569563527.221575952, 160.032000000]: received Imu
[ INFO] [1569563527.221634350, 160.032000000]: received Scan

no Image, no Odom … I will post AWS output as soon as it finishes.

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


(the hacked code is here https://github.com/robotika/osgar/commit/0a3251d018efad03da0bb1a50938c100df78412b)

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


(sure enough I have mixed topic name and type of Odometry, but we will se about Image)

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


354.216000000 ERROR /X2/ros1_ign_image_camera [/tmp/binarydeb/ros-melodic-roscpp-1.14.3/src/libros/publication.cpp:498(Publication::validateHeader)] [topics: /rosout, /X2/front/image_raw/compressedDepth, /X2/front/image_raw/compressedDepth/parameter_descriptions, /X2/front/image_raw/compressedDepth/parameter_updates, /X2/front/image_raw/compressed, /X2/front/image_raw/compressed/parameter_descriptions, /X2/front/image_raw/compressed/parameter_updates, /X2/front/image_raw, /X2/front/image_raw/theora, /X2/front/image_raw/theora/parameter_descriptions, /X2/front/image_raw/theora/parameter_updates] Client [/subt_solution] wants topic /X2/front/image_raw/compressed to have datatype/md5sum [sensor_msgs/Image/060021388200f6f0f447d0fcd9c64743], but our version has [sensor_msgs/CompressedImage/8f7a12909da2c9d3332d540a0977563f]. Dropping connection. 

so I mixed somehow also the image …

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


Arthur, I will probably return the sleep 30 back to run_simulation.bash … the last run terminated with “Error: InitializationFailed” (BTW it is pity, that I was not notified by e-mail :disappointed: … only successful run is reported). [robotika, simple_tunnel_02, ver8, 177c358b-535f-4686-85a7-e83e7923e5f0]

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


(running ver9 which differs only by this extra sleep at the beginning)

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


I can confirm that this extra sleep at the beginning helped (ver9).

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


I will switch to single node which waits at the beginning, thanks m.

osrf-migration commented 5 years ago

Original comment by Arthur Schang (Bitbucket: Arthur Schang).


I’m glad you were able to get consistent initialization figured out. It looks like you are aware of why your subscribers didn’t work as well (when defined, datatypes must match). As far as building off the subt_seed example goes, hopefully it works for you!

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).