ziyc / drivestudio

A 3DGS framework for omni urban scene reconstruction and simulation.
https://ziyc.github.io/omnire/
MIT License
437 stars 29 forks source link

Logo

A 3DGS framework for omni urban scene reconstruction and simulation!

## About DriveStudio is a 3DGS codebase for urban scene reconstruction/simulation. It offers a system with multiple Gaussian representations to jointly reconstruct backgrounds, vehicles, and non-rigid categories (pedestrians, cyclists, etc.) from driving logs. DriveStudio also provides a unified data system supporting various popular driving datasets, including [Waymo](https://waymo.com/open/), [PandaSet](https://pandaset.org/), [Argoverse2](https://www.argoverse.org/av2.html), [KITTI](http://www.cvlibs.net/datasets/kitti/), [NuScenes](https://www.nuscenes.org/), and [NuPlan](https://www.nuscenes.org/nuplan). This codebase also contains the **official implementation** of: > **OmniRe: Omni Urban Scene Reconstruction**
[Project Page](https://ziyc.github.io/omnire/) | [Paper](https://arxiv.org/abs/2408.16760)
[Ziyu Chen](https://ziyc.github.io/), [Jiawei Yang](https://jiawei-yang.github.io/), [Jiahui Huang](https://huangjh-pub.github.io/), [Riccardo de Lutio](https://riccardodelutio.github.io/), [Janick Martinez Esturo](https://www.jme.pub/), [Boris Ivanovic](https://www.borisivanovic.com/), [Or Litany](https://orlitany.github.io/), [Zan Gojcic](https://zgojcic.github.io/), [Sanja Fidler](https://www.cs.utoronto.ca/~fidler/), [Marco Pavone](https://stanford.edu/~pavone/), [Li Song](https://medialab.sjtu.edu.cn/author/li-song/), [Yue Wang](https://yuewang.xyz/) # 🎉 Try your own projects/research on DriveStudio! ### 🔥 Highlighted implementations Our codebase supports two types of Gaussian trainers: 1. Single-Representation trainer (single Gaussian representation for the entire scene): - Deformable Gaussians - Periodic Vibration Gaussians 2. Multi-Representation trainer (Gaussian scene graphs trainer): - Background: Static Gaussians (Vanilla Gaussians) - Vehicles: Static Gaussians - Humans: SMPL-Gaussians, Deformable Gaussians - Other non-rigid categories: Deformable Gaussians **Implemented methods:** | Method Name | Implementation | Trainer Type | Gaussian Representations | |-------------|----------------|--------------|--------------------------| | [OmniRe](https://ziyc.github.io/omnire/) | Official | Multi | • Static Gaussians: Background, Vehicles
• SMPL Gaussians: Pedestrians (majority)
• Deformable Gaussians: Cyclists, far-range pedestrians, other non-rigid categories | | [Deformable-GS](https://github.com/ingra14m/Deformable-3D-Gaussians) | Unofficial | Single | • Deformable Gaussians: Entire scene | | [PVG](https://github.com/fudan-zvg/PVG) | Unofficial | Single | • Periodic Vibration Gaussians: Entire scene | | [Street Gaussians](https://github.com/zju3dv/street_gaussians) | Unofficial | Multi | • Static Gaussians: Background, Vehicles | We extend our gratitude to the authors for their remarkable contributions. If you find these works useful, please consider citing them. ### 🚗 Dataset Support This codebase provides support for popular driving datasets. We offer instructions and scripts on how to download and process these datasets: | Dataset | Instruction | Cameras | Sync Frequency | Object Annotation | |---------|-------------|---------|----------------|-------------------| | Waymo | [Data Process Instruction](docs/Waymo.md) | 5 cameras | 10Hz | ✅ | | NuScenes | [Data Process Instruction](docs/NuScenes.md) | 6 cameras | 2Hz (up to 10Hz*) | ✅ | | NuPlan | [Data Process Instruction](docs/Nuplan.md) | 8 cameras | 10Hz | ✅ | | ArgoVerse | [Data Process Instruction](docs/ArgoVerse.md) | 7 cameras | 10Hz | ✅ | | PandaSet | [Data Process Instruction](docs/Pandaset.md) | 6 cameras | 10Hz | ✅ | | KITTI | [Data Process Instruction](docs/KITTI.md) | 2 cameras | 10Hz | ✅ | *NOTE: For NuScenes data, LiDAR operates at 20Hz and cameras at 12Hz, but keyframes (with object annotations) are only at 2Hz. We provide a method to interpolate annotations up to 10Hz. ### ✨ Functionality
Click to expand functionality details We have implemented interesting and useful functionalities: 1. **Flexible multi-camera training:** Choose any combination of cameras for training - single, multiple, or all. You can set these up by **SIMPLY** configuring your selection in the config file. 2. **Powered by gsplat** Integrated [gsplat](https://github.com/nerfstudio-project/gsplat) rasterization kernel with its advanced functions, e.g. absolute gradients, anti-aliasing, etc. 3. **Camera Pose Refinement:** Recognizing that camera poses may not always be sufficiently accurate, we provide a method to refine and optimize these poses. 4. **Objects' GT Bounding Box Refinement:** To address noise in ground truth boxes, we've added this feature to further improve accuracy and robustness. 5. **Affine Transformation:** This feature handles camera exposure and other related issues, enhancing the quality of scene reconstruction. 6. ... These functionalities are designed to enhance the overall performance and flexibility of our system, allowing for more accurate and adaptable scene reconstruction across various datasets and conditions.
## 📢 Updates **[Aug 2024]** Release code of DriveStudio. ## 🔨 Installation Run the following commands to set up the environment: ```shell # Clone the repository with submodules git clone --recursive https://github.com/ziyc/drivestudio.git cd drivestudio # Create the environment conda create -n drivestudio python=3.9 -y conda activate drivestudio pip install -r requirements.txt pip install git+https://github.com/nerfstudio-project/gsplat.git@v1.3.0 pip install git+https://github.com/facebookresearch/pytorch3d.git pip install git+https://github.com/NVlabs/nvdiffrast # Set up for SMPL Gaussians cd third_party/smplx/ pip install -e . cd ../.. ``` ## 📊 Prepare Data We support most popular public driving datasets. Detailed instructions for downloading and processing each dataset are available in the following documents: - Waymo: [Data Process Instruction](docs/Waymo.md) - NuScenes: [Data Process Instruction](docs/NuScenes.md) - NuPlan: [Data Process Instruction](docs/Nuplan.md) - ArgoVerse: [Data Process Instruction](docs/ArgoVerse.md) - PandaSet: [Data Process Instruction](docs/Pandaset.md) - KITTI: [Data Process Instruction](docs/KITTI.md) ## 🚀 Running ### Training ```shell export PYTHONPATH=$(pwd) start_timestep=0 # start frame index for training end_timestep=-1 # end frame index, -1 for the last frame python tools/train.py \ --config_file configs/omnire.yaml \ --output_root $output_root \ --project $project \ --run_name $expname \ dataset=waymo/3cams \ data.scene_idx=$scene_idx \ data.start_timestep=$start_timestep \ data.end_timestep=$end_timestep ``` - To run other methods, change `--config_file`. See `configs/` for more options. - Specify dataset and number of cameras by setting `dataset`. Examples: `waymo/1cams`, `waymo/5cams`, `pandaset/6cams`, `argoverse/7cams`, etc. You can set up arbitrary camera combinations for each dataset. See `configs/datasets/` for custom configuration details. - For over 3 cameras or 450+ images, we recommend using `omnire_extended_cam.yaml`. It works better in practice. ### Evaluation ```shell python tools/eval.py --resume_from $ckpt_path ``` ## 👏 Contributions We're improving our project to develop a robust driving recom/sim system. Some areas we're focusing on: - A real-time viewer for background and foreground visualization - Scene editing and simulation tools - Other Gaussian representations (e.g., 2DGS, surfels) We welcome pull requests and collaborations. If you'd like to contribute or have questions, feel free to open an issue or contact [Ziyu Chen](https://github.com/ziyc) (ziyu.sjtu@gmail.com). ## 🙏 Acknowledgments We utilize the rasterization kernel from [gsplat](https://github.com/nerfstudio-project/gsplat). Parts of our implementation are based on work from [EmerNeRF](https://github.com/NVlabs/EmerNeRF), [NerfStudio](https://github.com/nerfstudio-project/nerfstudio), [GART](https://github.com/JiahuiLei/GART), and [Neuralsim](https://github.com/PJLab-ADG/neuralsim). We've also implemented unofficial versions of [Deformable-GS](https://github.com/ingra14m/Deformable-3D-Gaussians), [PVG](https://github.com/fudan-zvg/PVG), and [Street Gaussians](https://github.com/zju3dv/street_gaussians), with reference to their original codebases. We extend our deepest gratitude to the authors for their contributions to the community, which have greatly supported our research. ## Citation ``` @article{chen2024omnire, title={OmniRe: Omni Urban Scene Reconstruction}, author={Chen, Ziyu and Yang, Jiawei and Huang, Jiahui and Lutio, Riccardo de and Esturo, Janick Martinez and Ivanovic, Boris and Litany, Or and Gojcic, Zan and Fidler, Sanja and Pavone, Marco and Song, Li and Wang, Yue}, journal={arXiv preprint arXiv:2408.16760}, year={2024} } ```