Authors: Nathaniel Simon and Anirudha Majumdar
Intelligent Robot Motion Lab, Princeton University
Project Page | Paper (arXiv) | Video |
MonoNav is a monocular navigation stack that uses RGB images and camera poses to generate a 3D reconstruction, enabling the use of conventional planning techniques. MonoNav leverages pre-trained depth-estimation (ZoeDepth) and off-the-shelf fusion (Open3D) to generate a real-time 3D reconstruction of the environment. At each planning step, MonoNav selects from a library of motion primitives to navigate collision-free towards the goal. While the robot executes each motion primitive, new images and poses are integrated into the reconstruction. In our paper, we demonstrate MonoNav on a 37 gram micro aerial vehicle (MAV) navigating hallways at 0.5 m/s (see project page and video).
This repository contains code to run the following:
MonoNav pipeline (mononav_cf.py
): Monocular hallway navigation using a Crazyflie + FPV camera, as seen in our paper. We encourage you to adapt this script to other vehicles / scenes!
We also offer scripts that break MonoNav into sub-parts, which can be run independently & offline:
collect_dataset.py
): Collect images and poses from your own camera / robot.estimate_depth.py
): Estimate depths from RGB images using ZoeDepth.fuse_depth.py
): Fuse the estimated depth images and camera poses into a 3D reconstruction.simulate.py
): Step through the 3D reconstruction and visualize the motion primitives chosen by the MonoNav planner. This is a useful way to replay and debug MonoNav trials.These scripts (run in sequence) form a demo, which we highly recommend before adapting MonoNav for your system. We include a sample dataset (data/demo_hallway
), so no robot is needed to run the demo!
In addition, we include the following resources in the /utils/
directory:
utils/test_camera.py
to test your camera,utils/generate_primitives.py
to generate and visualize new motion primitives,utils/calibration/take_pictures.py
to take pictures of a calibration target,utils/calibration/calibrate.py
to calibrate the camera with OpenCV and save the camera intrinsics to file,utils/calibration/transform.py
to test the undistortion and transformation pipeline.We hope you enjoy MonoNav!
Clone the repository and its submodules (ZoeDepth):
git clone --recurse-submodules https://github.com/natesimon/MonoNav.git
Install the dependencies from environment.yml
using mamba (fastest):
mamba env create -n mononav --file environment.yml
mamba activate mononav
or conda :
conda env create -n mononav --file environment.yml
conda activate mononav
Note: If the installation gets stuck at Solving environment: ...
, we recommend updating your system, re-installing conda / miniconda, or using mamba.
Tested on: (release / driver / GPU)
If you do not have access to GPU, set device = "CPU:0"
in config.yml
. This will reduce the speed of both depth estimation and fusion, and may not be fast enough for real-time operations.
We include a demo dataset (data/demo_hallway
) to try MonoNav out of the box - no robot needed! From a series of (occasionally noisy) images and poses, we will transform the images, estimate depth, and fuse them into a 3D reconstuction.
The dataset includes just RGB images and camera poses from a Crazyflie (see Hardware):
├── <demo_hallway>
│ ├── <crazyflie_poses> # camera poses
│ ├── <crazyflie_rgb_images> # raw camera images
Set the dataset path in config.yml
. By default, data_dir: 'data/demo_hallway
, but make sure to change this if you want to process your own dataset.
data_dir: 'data/demo_hallway' # change to whichever directory you want to process
To demonstrate ZoeDepth, run python estimate_depth.py
. This reads in the crazyflie images and transforms them to match the camera intrinsics used in the ZoeDepth training dataset. This is crucial for depth estimation accuracy (see Calibration for more details). The transformed images are saved in <kinect_rgb_images>
and used to estimate depth. The estimated depths are saved as numpy arrays and colormaps (for visualization) in <kinect_depth_images>
. After running, take a look at the resulting images and note the loss of peripheral information as the raw images are undistorted.
To demonstrate fusion, run: python fuse_depth.py
. This script reads in the (transformed) images, poses, and depths, and integrates them using Open3D's TSDF Fusion. After completion, a reconstruction should be displayed with coordinate frames to mark the camera poses throughout the run. The reconstruction is saved to file as a VoxelBlockGrid (vbg.npz
) and pointcloud (pointcloud.ply
- which can be opened using MeshLab).
Next, run python simulate.py
. This loads the reconstruction (vbg.npz
) and executes the MonoNav planner. The planner is executed at each of the camera poses, and does the following:
utils/trajlib
),choose_primitive()
in utils/utils.py
selects the primitive that makes the most progress towards goal_position
while remaining min_dist2obs
from all obstacles in the reconstruction,simulate.py
is useful for debugging and de-briefing, and also to anticipate how changes in the trajectory library or planner affect performance. For example, by changing min_dist2obs
inconfig.yml
, it is possible to see how increasing/decreasing the distance threshold to obstacles affects planner performance.Finally, try changing the motion primitives to see how they affect planner performance! To modify and generate the trajectory library, open utils/generate_primitives.py
. Try changing num_trajectories
from 7
to 11
, and run generate_primitives.py.
This will display the new motion primitives and update the trajectory library. Note that each motion primitive is defined by a set of gentle turns left, right, or straight. An "extension" segment is added to the primitive (but not flown) to encourage foresight in the planner. See our paper for more details. Feel free to re-run simulate.py
to try out the new primitives!
The tutorial should result in the additional files added to data/demo_hallway
:
├── <demo_hallway>
│ ├── <kinect_rgb_images> # images transformed to match kinect intrinsics
│ ├── <kinect_depth_images> # estimated depth (.npy for fusion and .jpg for visualization)
│ ├── vbg.npz / pointcloud.ply # reconstructions generated by fuse_depth.py
If you are unable to execute the full tutorial but want to reference the output, you can download it from Google Drive. If you have made it through the tutorials, you can try MonoNav on your own dataset!
To run MonoNav on your own dataset, there are two crucial steps:
collect_dataset.py
which works for the Crazyflie, but you may have to modify it for your system. Ensure that you are transforming the pose (rotation + translation) into the Open3D frame correctly, and saving it in homogeneous form. See get_crazyflie_pose
in utils/utils.py
for reference. Make sure to update data_dir
in config.yml
to point to your collected dataset..json
of camera intrinsics and distortion coefficients. config.yml
should then be updated to point to the intrinsics json file path: camera_calibration_path: 'utils/calibration/intrinsics.json'
.With those steps complete, you can run estimate_depth.py
, fuse_depth.py
, and simulate.py
to reconstruct and try MonoNav on your own dataset!
A key aspect to MonoNav is using a pre-trained depth estimation network on a different camera than the one used during training. The micro FPV camera (Wolfwhoop WT05) that we use has significant barrel distortion (fish-eye), and thus the images must be first undistorted to better match the camera intrinsics used to collect the training data. To maintain the metric depth estimation accuracy of the model, we must transform the input image to match the intrinsics of the training dataset. ZoeDepth is trained on NYU-Depth-v2, which used the Microsoft Kinect.
The transform_image()
function in utils/utils.py
performs the transformation: resizing the image and undistorting it to match the Kinect's intrinsics.
In utils/calibration
, we provide scripts to generate intrinsics.json
for your own camera. Steps to calibrate:
take_pictures.py
: Take many pictures (recommended: 80+) of the chessboard by pressing the spacebar. Saves them to utils/calibration/calibration_pictures/
.calibrate.py
: Based on the OpenCV sample.You need to provide several arguments, including the structure and dimensions of your chessboard target. Example:
MonoNav/utils/calibration$ python calibrate.py -w 6 -h 8 -t chessboard --square_size=35 ./calibration_pictures/frame*.jpg
The intrinsics are printed and saved to utils/calibration/intrinsics.json
.
transform.py
: This script loads the intrinsics from intrinsics.json
and transforms your calibration_pictures
to the Kinect's dimensions (640x480) and intrinsics. This operation may involve resizing your image. The transformed images are saved in utils/calibration/transform_output
and should be inspected.Finally, we recommend re-running calibration on transform_output
to ensure that the intrinsics match the Kinect.
MonoNav/utils/calibration$ python calibrate.py -w 6 -h 8 -t chessboard --square_size=35 ./transform_output/frame*.jpg
transform.py
will save the intrinsics of the transformed images to check_intrinsics.json
, which should roughly match those of the Kinect:
[[525. 0. 319.5]
[ 0. 525. 239.5]
[ 0. 0. 1. ]]
To run MonoNav as shown in our paper, you need a monocular robot with pose (position & orientation) estimation. We used the Crazyflie 2.1 micro aerial vehicle modified with an FPV camera. Our hardware setup follows closely the one used in Princeton's Introduction to Robotics course. If you are using the Crazyflie, we recommend that you follow the Bitcraze tutorials to ensure that the vehicle flies and commmunicates properly.
List of parts:
mononav_cf.py
The mononav_cf.py
("cf" for "crazyflie") script performs the image transformation, depth estimation, fusion, and planning process simultaneously for goal-directed obstacle avoidance. If FLY_CRAZYFLIE: True
, the Crazyflie will takeoff, if False
, the pipeline will execute without the Crazyflie starting its motors (useful for testing).
After takeoff, the Crazyflie can be controlled manually by the following key commands:
w: choose MIDDLE primitive (typically FORWARD)
a: choose FIRST primitive (typically LEFT)
d: choose LAST primitive (typically RIGHT)
c: end control (stop and land)
q: end control immediately (EMERGENCY stop and land)
g: start MonoNav
Manual control is an excellent way to check that the pipeline is working, as it should produce a sensible reconstruction after landing. As mentioned in the paper, it is HIGHLY RECOMMENDED to manually fly forward 3x (press w, w, w
) before starting MonoNav (press g
). This is due to the narrow field of view of the transformed images, which discards peripheral information; to make an informed decision, the planner needs information collected 3 primitives ago.
After MonoNav is started, the Crazyflie will choose and execute primitives according to the planner. If collision seems imminent, you can manually choose a primitive (by w, a, d
) or stop the planner (c
or q
). During the run, a crazyflie_trajectories.csv
log is produced, which includes the frame_number
and time_elapsed
during replanning, as well as the chosen_traj_idx
.
MonoNav is a "work in progress" - there are many exciting directions for future work! If you find any bugs or implement any exciting features, please submit a pull request as we'd love to continue improving the system. Once you get MonoNav running on your robot, send us a video! We'd love to see it.
Areas of future work in the pipeline:
@inproceedings{simon2023mononav,
author = {Nathaniel Simon and Anirudha Majumdar},
title = {{MonoNav: MAV Navigation via Monocular Depth Estimation and Reconstruction}},
booktitle = {Symposium on Experimental Robotics (ISER)},
year = {2023},
url = {https://arxiv.org/abs/2311.14100}
}