Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration
Arxiv | Project Page | Video
[06/12/2024] 🔥🔥🔥 background rendering speed up! 3D Gaussian splatting is integrated as a background rendering engine, rendering 50 frames within 30s.
[06/12/2024] 🔥🔥🔥 foreground rendering speed up! multiple process for blender rendering in parallel! rendering 50 frames within 5 minutes.
First clone this repo recursively.
git clone https://github.com/yifanlu0227/ChatSim.git --recursive
conda create -n chatsim python=3.9 git-lfs
conda activate chatsim
We offer two background rendering methods, one is McNeRF
in our paper, and another is 3D Gaussian Splatting
. McNeRF
encodes the exposure time and achieves brightness-consistent rendering. 3D Gaussian Splatting
is much faster (about 50 x) in rendering and has higher PSNR in training views. However, strong perspective shifts result in noticeable artifacts.
McNeRF
https://github.com/yifanlu0227/ChatSim/assets/45688237/6e7e4411-31e5-46e3-9ca2-be0d6e813a60
3D Gaussian Splatting
https://github.com/yifanlu0227/ChatSim/assets/45688237/e7ac487c-5615-455d-bb38-026aaaabce70
Installing either one is OK! If you want high rendering speed and do not care about brightness inconsistency, choose 3D Gaussian Splatting
.
cd ../inpainting/Inpaint-Anything/
python -m pip install -e segment_anything
gdown https://drive.google.com/drive/folders/1wpY-upCo4GIW4wVPnlMh_ym779lLIG2A -O pretrained_models --folder
gdown https://drive.google.com/drive/folders/1SERTIfS7JYyOOmXWujAva4CDQf-W7fjv -O pytracking/pretrain --folder
cd ../latent-diffusion
pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip
pip install -e .
# download pretrained ldm
wget -O models/ldm/inpainting_big/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1
We tested with Blender 3.5.1. Note that Blender 3+ requires Ubuntu version >= 20.04.
cd ../../Blender
wget https://download.blender.org/release/Blender3.5/blender-3.5.1-linux-x64.tar.xz
tar -xvf blender-3.5.1-linux-x64.tar.xz
rm blender-3.5.1-linux-x64.tar.xz
locate the internal Python of Blender, for example, blender-3.5.1-linux-x64/3.5/python/bin/python3.10
export blender_py=$PWD/blender-3.5.1-linux-x64/3.5/python/bin/python3.10
cd utils
# install dependency (use the -i https://pypi.tuna.tsinghua.edu.cn/simple if you are in the Chinese mainland)
$blender_py -m pip install -r requirements.txt
$blender_py -m pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
$blender_py setup.py develop
If you want to get smoother and more realistic trajectories, you can install the trajectory module and change the parameter motion_agent-motion_tracking
to True in .yaml file. For installation (both code and pre-trained model), you can run the following commands in the terminal. This requires Pytorch >= 1.13.
pip install frozendict gym==0.26.2 stable-baselines3[extra] protobuf==3.20.1
cd chatsim/foreground
git clone --recursive git@github.com:MARMOTatZJU/drl-based-trajectory-tracking.git -b v1.0.0
cd drl-based-trajectory-tracking
source setup-minimum.sh
Then when the parameter motion_agent-motion_tracking
is set as True, each trajectory will be tracked by this module to make it smoother and more realistic.
If you want to train the skydome model, follow the README in chatsim/foreground/mclight/skydome_lighting/readme.md
. You can download our provided skydome HDRI in the next section and start the simulation.
mkdir data
mkdir data/waymo_tfrecords
mkdir data/waymo_tfrecords/1.4.2
Download the waymo perception dataset v1.4.2 to the data/waymo_tfrecords/1.4.2
. In the google cloud console, the correct folder path is waymo_open_dataset_v_1_4_2/individual_files/training
or waymo_open_dataset_v_1_4_2/individual_files/validation
. Some static scenes we have used are listed here. Use Filter
to find them quickly, or use gcloud to download them in batch.
If you have installed gcloud
, you can download the above tfrecords via
bash data_utils/download_waymo.sh data_utils/waymo_static_32.lst data/waymo_tfrecords/1.4.2
After downloading tfrecords, you should see a folder structure like the following. If you download the tfrecord files from the console, you will also have prefixes like individual_files_training_
or individual_files_validation_
.
data
|-- ...
|-- ...
`-- waymo_tfrecords
`-- 1.4.2
|-- segment-10247954040621004675_2180_000_2200_000_with_camera_labels.tfrecord
|-- segment-11379226583756500423_6230_810_6250_810_with_camera_labels.tfrecord
|-- ...
`-- segment-1172406780360799916_1660_000_1680_000_with_camera_labels.tfrecord
We extract the images, camera poses, LiDAR file, etc. out of the tfrecord files with the data_utils/process_waymo_script.py
:
cd data_utils
python process_waymo_script.py --waymo_data_dir=../data/waymo_tfrecords/1.4.2 --nerf_data_dir=../data/waymo_multi_view
This will generate the data folder data/waymo_multi_view
.
The final data folder will be like:
data
`-- waymo_multi_view
|-- ...
`-- segment-1172406780360799916_1660_000_1680_000_with_camera_labels
|-- 3d_boxes.npy # 3d bounding boxes of the first frame
|-- images # a clip of waymo images used in chatsim (typically 40 frames)
|-- images_all # full waymo images (typically 198 frames)
|-- map.pkl # map data of this scene
|-- point_cloud # point cloud file of the first frame
|-- cams_meta.npy # Camera ext&int calibrated by metashape and transformed to waymo coordinate system.
|-- cams_meta_metashape.npy # Camera ext&int calibrated by metashape (intermediate file, relative scale, not required by simulation inference)
|-- cams_meta_colmap.npy # Camera ext&int calibrated by colmap (intermediate file, relative scale, not required by simulation inference)
|-- cams_meta_waymo.npy # Camera ext&int from original waymo dataset (intermediate file, not required by simulation inference)
|-- shutters # normalized exposure time (mean=0 std=1)
|-- tracking_info.pkl # tracking data
|-- vehi2veh0.npy # transformation matrix from i-th frame's vehicle coordinate to the first frame's vehicle
|-- camera.xml # calibration file from Metashape (intermediate file, not required by simulation inference)
`-- colmap/sparse_undistorted/[images/cams_meta.npy/points3D_waymo.ply] # calibration files from COLMAP (intermediate file, only required when using 3dgs rendering)
Coordinate Convention
point_cloud/000_xxx.pcd
are in the ego vehicle's coordinatecamera.xml
are RDF convention (x-right, y-down, z-front).cams_meta.npy
are in RUB convention (x-right, y-up, z-back).vehi2veh0.npy
transformation between vehicle coordinates, vehicle coordinates are FLU convention (x-front, y-left, z-up), as Waymo paper illustrated.cams_meta.npy
instruction
cams_meta.shape = (N, 27)
cams_meta[:, 0 :12]: flatten camera poses in RUB, world coordinate is the starting frame's vehicle coordinate.
cams_meta[:, 12:21]: flatten camse intrinsics
cams_meta[:, 21:25]: distortion params [k1, k2, p1, p2]
cams_meta[:, 25:27]: bounds [z_near, z_far] (not used.)
data/blender_assets
.
# suppose you are in ChatSim/data
git lfs install
git clone https://huggingface.co/datasets/yifanlu/Blender_3D_assets
cd Blender_3D_assets
git lfs pull # about 1GB, You might meet `Error updating the Git index: (1/1), 1.0 GB | 7.4 MB/s` when finishing `git lfs pull`. It doesn't matter. Please continue.
cd .. mv Blender_3D_assets/assets.zip ./ unzip assets.zip rm assets.zip rm -rf Blender_3D_assets mv assets blender_assets
Our 3D models are collected from the Internet. We tried our best to contact the author of the model and ensure that copyright issues are properly dealt with (our open-source projects are not for profit). If you are the author of a model and our behaviour infringes your copyright, please contact us immediately and we will delete the model.
#### Download Skydome HDRI
- [Skydome HDRI](https://huggingface.co/datasets/yifanlu/Skydome_HDRI/tree/main). Download with the following command and make sure they are in `data/waymo_skydome`.
```bash
# suppose you are in ChatSim/data
git lfs install
git clone https://huggingface.co/datasets/yifanlu/Skydome_HDRI
mv Skydome_HDRI/waymo_skydome ./
rm -rf Skydome_HDRI
You can also train the skydome estimation network yourself. Go to chatsim/foreground/mclight/skydome_lighting
and follow chatsim/foreground/mclight/skydome_lighting/readme.md
for the training.
Either train McNeRF
or 3D Gaussian Splatting
, depending on your installation.
Set the API to an environment variable. Also, set OPENAI_API_BASE
if you have network issues (especially in China mainland).
export OPENAI_API_KEY=<your api key>
Now you can start the simulation with
python main.py -y ${CONFIG YAML} \
-p ${PROMPT} \
[-s ${SIMULATION NAME}]
${CONFIG YAML}
specifies the scene information, and yamls are stored in config
folder. e.g. config/waymo-1137.yaml
.
${PROMPT}
is your input prompt, which should be wrapped in quotation marks. e.g. add a straight driving car in the scene
.
${SIMULATION NAME}
determines the name of the folder when saving results. default demo
.
You can try
# if you train nerf
python main.py -y config/waymo-1137.yaml -p "Add a Benz G in front of me, driving away fast."
# if you train 3DGS
python main.py -y config/3dgs-waymo-1137.yaml -p "Add a Benz G in front of me, driving away fast."
The rendered results are saved in results/1137_demo_%Y_%m_%d_%H_%M_%S
. Intermediate files are saved in results/cache/1137_demo_%Y_%m_%d_%H_%M_%S
for debug and visualization if save_cache
are enabled in config/waymo-1137.yaml
.
config/waymo-1137.yaml
contains a detailed explanation for each entry. We will give some extra explanation. Suppose the yaml is read into config_dict
:
config_dict['scene']['is_wide_angle']
determines the rendering view. If set to True
, we will expand Waymo's intrinsics (width -> 3 x width) to render wide-angle images. Also note that is_wide_angle = True
comes with rendering_mode = 'render_wide_angle_hdr_shutter'
; is_wide_angle = False
comes with rendering_mode = 'render_hdr_shutter'
config_dict['scene']['frames']
the frame number for rendering.
config_dict['agents']['background_rendering_agent']['nerf_quiet_render']
will determine whether to print the output of mcnerf to the terminal. Set to False
for debug use.
config_dict['agents']['foreground_rendering_agent']['use_surrounding_lighting']
defines whether we use the surrounding lighting. Currently use_surrounding_lighting = True
only takes effect when merely one vehicle is added, because HDRI is a global illumination in Blender. It is difficult to set a separate HDRI for each car. use_surrounding_lighting = True
can also lead to slow rendering, since it will call nerf #frame
times. We set it to False
in each default yaml.
config_dict['agents']['foreground_rendering_agent']['skydome_hdri_idx']
is the filename (w.o. extension) we choose from data/waymo_skydome/${SCENE_NAME}/
. It is the skydome HDRI estimation from the first frame('000'
) by default, but you can manually select a better estimation from another frame. To view the HDRI, we recommend the VERIV for vscode and tev for desktop environment.
@InProceedings{wei2024editable,
title={Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents},
author={Yuxi Wei and Zi Wang and Yifan Lu and Chenxin Xu and Changxing Liu and Hao Zhao and Siheng Chen and Yanfeng Wang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month={June},
year={2024},
}