hello-robot / stretch_web_teleop

Remote web teleoperation for the Stretch mobile manipulators from Hello Robot Inc.
Other
13 stars 0 forks source link

Fix high latency, CPU, memory usage #61

Closed hello-amal closed 2 months ago

hello-amal commented 2 months ago

Description

This PR addresses three issues related to efficiency:

  1. We've been noticing higher latency in the web app's images than before.
  2. When the gripper camera's depth sensing overlay is enabled, CPU usage for that process spikes to up to 900%.
  3. The web app browser uses considerable memory.

The PR addresses each of the above issues as follows:

  1. Latency:
    1. As opposed to treating the expanded gripper view as a separate camera stream / ROS topic, this PR has it reuse the same topic as the default gripper view. We add a ROS service to toggle on/off expanded gripper view.
    2. Ensure all subscribers/publishers have a QoS of "best effort" for reliability and a queue size of 1 for history.
    3. This PR also reduces the commanded FPS for the navigation camera from 100 to 15.
    4. This PR splits each video stream into a separate node, to avoid callbacks from one video stream starving another video stream of threads.
    5. This PR also adds a convenient way to measure latency, by toggling verbose on here and then running pm2 log start_robot_browser. Note that that is just the latency between when librealsense gets the frame and when the robot browser receives it, but that is the bulk of the latency -- WebRTC latency (which can be checked at chrome://webrtc-internals/) is in the tens of ms.
  2. CPU Load:
    1. This PR downsamples pointclouds before doing any matrix multiplications with them. To avoid a grainy-looking overlay, this PR also expands the pixels that are being overlaid.
    2. This PR filters the gripper pointcloud to points within 30cm depth before deprojecting the pixel, to further reduce the pointcloud size prior to matrix multiplication.
    3. Overall, before doing any matrix multiplications, this PR reduces the Realsense pointcloud from ~45K to ~15K, and the gripper pointcloud from ~92K to ~12K.
  3. Memory Usage: As opposed to directly setting the video source to the image data, this PR first converts it into a blob and then creates an object URL for the blob. This allows us to explicitly revoke the object URL before we render the next image, which prevents the browser from caching the image.

Note that because this PR is the last of a 3 PR chain that will all be merged together (#52 , #57 , and this one), this PR also includes some final fixes for those two PRs:

  1. Fixing color convention for beta teleop camera.
  2. Adjusting joint lift in click-to-pregrasp for objects near the ground, to avoid colliding into the base.
  3. Monitoring effort and terminating click-to-pregrasp if they exceed a threshold (indicating a collision).

Note on an additional possible speedup

We did notice that enabling loopback on multicast, switching the ROS middleware to CycloneDDS, and forcing CycloneDDS to use loopback results in tens of milliseconds less latency (~30-50ms). However, this change prevents other computers on the network from accessing the ROS nodes/topics, which is why we didn't push this change. For documentation purposes, instructions to change this are as follows:

First, enable multicast on loopback and install cyclone DDS.

sudo ifconfig lo multicast
sudo route add -net 224.0.0.0 netmask 240.0.0.0 dev lo
sudo apt install ros-humble-rmw-cyclonedds-cpp
export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
cd ~/ament_ws
colcon build

Second, create an XML file with the following contents:

<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
  <Domain id="any">
    <General>
      <Interfaces>
        <NetworkInterface name="lo"/>
      </Interfaces>
    </General>
  </Domain>
</CycloneDDS>

Finally, tell CycloneDDS to use that configuration: export CYCLONEDDS_URI=<absolute_path_to_file>.xml

Note that the above commands need to be re-run every time you restart your robot. To make them permanent, add the two exports of environment variables to ~/.bashrc, and add the two network configuration commands to a shell script that runs on startup as root.

Testing procedure

(All tests done on stretch-se3-3001, with a nearly-fresh software install)

  1. Measure latency before this change and after this change. Have all three video streams on the operator web interface. (For before, revert all of this PR except the part about verbose logging the latency for the robot browser to receive it (see above). Measure latency based on those verbose logs).
    1. Default streams:
      1. Before Change:
        1. Nav Cam: 0.03-0.07s
        2. D435: 0.32-0.38s
        3. D405: 0.07-0.13s
      2. After Change:
        1. Nav Cam: 0.03-0.10s
        2. D435: 0.08-0.11s
        3. D405: 0.13-0.19s
    2. Realsense Depth Sensing Only:
      1. Before Change:
        1. Nav Cam: 0.04-0.07s
        2. D435: 0.53-0.59s
        3. D405: 0.08-0.12s
      2. After Change:
        1. Nav Cam: 0.04-0.10s
        2. D435: 0.16-0.26s
        3. D405: 0.08-0.13s
    3. Expanded Gripper Only (undo the above):
      1. Before Change:
        1. Nav Cam: 0.04-0.08s
        2. D435: 0.35-0.42s
        3. D405: 0.09-0.14s
      2. After Change:
        1. Nav Cam: 0.03-0.10s
        2. D435: 0.14-0.18s
        3. D405: 0.11-0.15s
    4. Gripper Depth Sending Only (undo the above):
      1. Before Change:
        1. Nav Cam: 0.08-0.13s
        2. D435: 0.30-0.70s
        3. D405: 0.21-0.40s
        4. (Also, stream updates are received by the web app in batches, e.g., multiple nav stream, then multiple gripper and realsense, etc.)
      2. After Change:
        1. Nav Cam: 0.04-0.10s
        2. D435: 0.11-0.15s
        3. D405: 0.15-0.17s
        4. (And the streams are nicely interspersed, not batched like above)
    5. Expanded Gripper + Gripper Depth Sending + Realsense Depth Sending:
      1. Before Change:
        1. Nav Cam: 0.10-0.21s
        2. D435: 0.39-1.33s
        3. D405: 0.17-0.55s
        4. (Also, stream updates are received by the web app in batches, e.g., multiple nav stream, then multiple gripper and realsense, etc.)
      2. After Change:
        1. Nav Cam: 0.04-0.10s
        2. D435: 0.16-0.26s
        3. D405: 0.12-0.19s
        4. (And the streams are nicely interspersed, not batched like above)
    6. Default Streams Again:
      1. Before Change:
        1. Nav Cam: 0.03-0.07s
        2. D435: 0.35-0.42s
        3. D405: 0.07-0.15s
      2. After Change:
        1. Nav Cam: 0.03-0.10s
        2. D435: 0.13-0.15s
        3. D405: 0.10-0.13s
    7. Overall Latency Takeaways:
      1. D435 is better across the board.
      2. Both D435 and D405 are better and more reliable (smaller range) when depth sensing is on.
  2. Measure CPU load before and after this change, by running htop and reporting the name CPU % of the top process. (I kept verbose logging on for this)
    1. Default Streams:
      1. Before Change: 166%, headless firefox browser
      2. After Change: 135%, headless firefox browser
    2. Turn on Expanded Gripper View, Realsense Depth sending, and Gripper Depth Sensing.
      1. Before Change: 904%, configure_video_streams_depth. (Also note that after this, toggling off gripper depth sensing doesn't get received by ROS)
      2. After Change: 172%, headless firefox browser
  3. Keep the interface on for 10 minutes. Measure mem usage (as reported by htop) before this change and after this change, with default streams. (I turned verbose logging off for this).
    1. Before Change: 5.45 - 6.21 GB / 15.2 GB
    2. After Change: 5.19 - 5.68 / 15.2 GB

Before opening a pull request

From the top-level of this repository, run:

To merge