This PR addresses three issues related to efficiency:
We've been noticing higher latency in the web app's images than before.
When the gripper camera's depth sensing overlay is enabled, CPU usage for that process spikes to up to 900%.
The web app browser uses considerable memory.
The PR addresses each of the above issues as follows:
Latency:
As opposed to treating the expanded gripper view as a separate camera stream / ROS topic, this PR has it reuse the same topic as the default gripper view. We add a ROS service to toggle on/off expanded gripper view.
Ensure all subscribers/publishers have a QoS of "best effort" for reliability and a queue size of 1 for history.
This PR also reduces the commanded FPS for the navigation camera from 100 to 15.
This PR splits each video stream into a separate node, to avoid callbacks from one video stream starving another video stream of threads.
This PR also adds a convenient way to measure latency, by toggling verbose on here and then running pm2 log start_robot_browser. Note that that is just the latency between when librealsense gets the frame and when the robot browser receives it, but that is the bulk of the latency -- WebRTC latency (which can be checked at chrome://webrtc-internals/) is in the tens of ms.
CPU Load:
This PR downsamples pointclouds before doing any matrix multiplications with them. To avoid a grainy-looking overlay, this PR also expands the pixels that are being overlaid.
This PR filters the gripper pointcloud to points within 30cm depth before deprojecting the pixel, to further reduce the pointcloud size prior to matrix multiplication.
Overall, before doing any matrix multiplications, this PR reduces the Realsense pointcloud from ~45K to ~15K, and the gripper pointcloud from ~92K to ~12K.
Memory Usage: As opposed to directly setting the video source to the image data, this PR first converts it into a blob and then creates an object URL for the blob. This allows us to explicitly revoke the object URL before we render the next image, which prevents the browser from caching the image.
Note that because this PR is the last of a 3 PR chain that will all be merged together (#52 , #57 , and this one), this PR also includes some final fixes for those two PRs:
Fixing color convention for beta teleop camera.
Adjusting joint lift in click-to-pregrasp for objects near the ground, to avoid colliding into the base.
Monitoring effort and terminating click-to-pregrasp if they exceed a threshold (indicating a collision).
Note on an additional possible speedup
We did notice that enabling loopback on multicast, switching the ROS middleware to CycloneDDS, and forcing CycloneDDS to use loopback results in tens of milliseconds less latency (~30-50ms). However, this change prevents other computers on the network from accessing the ROS nodes/topics, which is why we didn't push this change. For documentation purposes, instructions to change this are as follows:
First, enable multicast on loopback and install cyclone DDS.
sudo ifconfig lo multicast
sudo route add -net 224.0.0.0 netmask 240.0.0.0 dev lo
sudo apt install ros-humble-rmw-cyclonedds-cpp
export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
cd ~/ament_ws
colcon build
Second, create an XML file with the following contents:
(All tests done on stretch-se3-3001, with a nearly-fresh software install)
Measure latency before this change and after this change. Have all three video streams on the operator web interface. (For before, revert all of this PR except the part about verbose logging the latency for the robot browser to receive it (see above). Measure latency based on those verbose logs).
Default streams:
Before Change:
Nav Cam: 0.03-0.07s
D435: 0.32-0.38s
D405: 0.07-0.13s
After Change:
Nav Cam: 0.03-0.10s
D435: 0.08-0.11s
D405: 0.13-0.19s
Realsense Depth Sensing Only:
Before Change:
Nav Cam: 0.04-0.07s
D435: 0.53-0.59s
D405: 0.08-0.12s
After Change:
Nav Cam: 0.04-0.10s
D435: 0.16-0.26s
D405: 0.08-0.13s
Expanded Gripper Only (undo the above):
Before Change:
Nav Cam: 0.04-0.08s
D435: 0.35-0.42s
D405: 0.09-0.14s
After Change:
Nav Cam: 0.03-0.10s
D435: 0.14-0.18s
D405: 0.11-0.15s
Gripper Depth Sending Only (undo the above):
Before Change:
Nav Cam: 0.08-0.13s
D435: 0.30-0.70s
D405: 0.21-0.40s
(Also, stream updates are received by the web app in batches, e.g., multiple nav stream, then multiple gripper and realsense, etc.)
After Change:
Nav Cam: 0.04-0.10s
D435: 0.11-0.15s
D405: 0.15-0.17s
(And the streams are nicely interspersed, not batched like above)
(Also, stream updates are received by the web app in batches, e.g., multiple nav stream, then multiple gripper and realsense, etc.)
After Change:
Nav Cam: 0.04-0.10s
D435: 0.16-0.26s
D405: 0.12-0.19s
(And the streams are nicely interspersed, not batched like above)
Default Streams Again:
Before Change:
Nav Cam: 0.03-0.07s
D435: 0.35-0.42s
D405: 0.07-0.15s
After Change:
Nav Cam: 0.03-0.10s
D435: 0.13-0.15s
D405: 0.10-0.13s
Overall Latency Takeaways:
D435 is better across the board.
Both D435 and D405 are better and more reliable (smaller range) when depth sensing is on.
Measure CPU load before and after this change, by running htop and reporting the name CPU % of the top process. (I kept verbose logging on for this)
Default Streams:
Before Change: 166%, headless firefox browser
After Change: 135%, headless firefox browser
Turn on Expanded Gripper View, Realsense Depth sending, and Gripper Depth Sensing.
Before Change: 904%, configure_video_streams_depth. (Also note that after this, toggling off gripper depth sensing doesn't get received by ROS)
After Change: 172%, headless firefox browser
Keep the interface on for 10 minutes. Measure mem usage (as reported by htop) before this change and after this change, with default streams. (I turned verbose logging off for this).
Description
This PR addresses three issues related to efficiency:
The PR addresses each of the above issues as follows:
pm2 log start_robot_browser
. Note that that is just the latency between whenlibrealsense
gets the frame and when the robot browser receives it, but that is the bulk of the latency -- WebRTC latency (which can be checked atchrome://webrtc-internals/
) is in the tens of ms.Note that because this PR is the last of a 3 PR chain that will all be merged together (#52 , #57 , and this one), this PR also includes some final fixes for those two PRs:
Note on an additional possible speedup
We did notice that enabling loopback on multicast, switching the ROS middleware to CycloneDDS, and forcing CycloneDDS to use loopback results in tens of milliseconds less latency (~30-50ms). However, this change prevents other computers on the network from accessing the ROS nodes/topics, which is why we didn't push this change. For documentation purposes, instructions to change this are as follows:
First, enable multicast on loopback and install cyclone DDS.
Second, create an XML file with the following contents:
Finally, tell CycloneDDS to use that configuration:
export CYCLONEDDS_URI=<absolute_path_to_file>.xml
Note that the above commands need to be re-run every time you restart your robot. To make them permanent, add the two exports of environment variables to
~/.bashrc
, and add the two network configuration commands to a shell script that runs on startup as root.Testing procedure
(All tests done on
stretch-se3-3001
, with a nearly-fresh software install)htop
and reporting the name CPU % of the top process. (I kept verbose logging on for this)htop
) before this change and after this change, with default streams. (I turned verbose logging off for this).Before opening a pull request
From the top-level of this repository, run:
pre-commit run --all-files
To merge
Squash & Merge