wenyuqing / panacea

[CVPR2024] Official Repository of Paper "Panacea: Panoramic and Controllable Video Generation for Autonomous Driving"
https://panacea-ad.github.io/
Apache License 2.0
166 stars 5 forks source link
autonomous-driving generative-models image-generation video-generation

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

Official Repository of Panacea.

[Paper] Panacea: Panoramic and Controllable Video Generation for Autonomous Driving,
Yuqing Wen1, Yucheng Zhao2,Yingfei Liu2, Fan Jia2, Yanhui Wang1, Chong Luo1, Chi Zhang3, Tiancai Wang2‡, Xiaoyan Sun1‡, Xiangyu Zhang2
1University of Science and Technology of China, 2MEGVII Technology, 3Mach Drive
Equal Contribution, This work was done during the internship at MEGVII, Corresponding Author.

[Paper] Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving,
Yuqing Wen1, Yucheng Zhao2,Yingfei Liu2, Binyuan Huang4, Fan Jia2, Yanhui Wang1, Chi Zhang3, Tiancai Wang2‡, Xiaoyan Sun1‡, Xiangyu Zhang2
1University of Science and Technology of China, 2MEGVII Technology, 3Mach Drive, 4Wuhan University
*Equal Contribution, This work was done during the internship at MEGVII, Corresponding Author.

[WebPage] https://panacea-ad.github.io/

#

News

Getting Started

Please follow our documentation step by step.

Environment Setup

Following the instruction from: Environment Setup.

Prepare dataset

Prepare real dataset following the instruction from Data Preparation.

Remember to put the dataset under the path data/nuscenes

Download pretrained checkpoint

Download the weights of the second stage from panaceaplus_40k_deepspeed.ckpt

Put it to folder checkpoints/

Inference

--split: to specify train or val sets

--use_last_frame=true means use the last frame as conditional image.

Run the following command to inference stage 2 on the whole training/val set of nuscenes.

python -m torch.distributed.launch --nproc_per_node=8 --master_port=1238 inference.py --base configs/inference_nuscenes.yaml --ckptpath --ckpt checkpoints/panaceaplus_40k_deepspeed.ckpt --split train --use_last_frame true --name EXP_NAME --bs 1

Generating Multi-View and Controllable Videos for Autonoumous Driving

Overview of Panacea. (a). The diffusion training process of Panacea, enabled by a diffusion encoder and decoder with the decomposed 4D attention module. (b). The decomposed 4D attention module comprises three components: intra-view attention for spatial processing within individual views, cross-view attention to engage with adjacent views, and cross-frame attention for temporal processing. (c). Controllable module for the integration of diverse signals. The image conditions are derived from a frozen VAE encoder and combined with diffused noises. The text prompts are processed through a frozen CLIP encoder, while BEV sequences are handled via ControlNet. (d). The details of BEV layout sequences, including projected bounding boxes, object depths, road maps and camera pose.

The two-stage inference pipeline of Panacea. Its two-stage process begins by creating multi-view images with BEV layouts, followed by using these images, along with subsequent BEV layouts, to facilitate the generation of following frames.

🎬   BEV-guided Video Generation   🎬

Controllable multi-view video generation. Panacea is able to generate realistic, controllable videos with good temporal and view consistensy.

🎞   Attribute Controllable Video Generation   🎞

Video generation with variable attribute controls, such as weather, time, and scene, which allows Panacea to simulate a variety of rare driving scenarios, including extreme weather conditions such as rain and snow, thereby greatly enhancing the diversity of the data.

🔥   Benefiting Autonomous Driving   🔥

(a). Panoramic video generation based on BEV (Bird’s-Eye-View) layout sequence facilitates the establishment of a synthetic video dataset, which enhances perceptual tasks. (b). Producing panoramic videos with conditional images and BEV layouts can effectively elevate image-only datasets to video datasets, thus enabling the advancement of video-based perception techniques.

BibTex

                
@inproceedings{wen2024panacea,
  title={Panacea: Panoramic and controllable video generation for autonomous driving},
  author={Wen, Yuqing and Zhao, Yucheng and Liu, Yingfei and Jia, Fan and Wang, Yanhui and Luo, Chong and Zhang, Chi and Wang, Tiancai and Sun, Xiaoyan and Zhang, Xiangyu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={6902--6912},
  year={2024}
}
@misc{wen2024panaceapanoramiccontrollablevideo,
      title={Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving}, 
      author={Yuqing Wen and Yucheng Zhao and Yingfei Liu and Binyuan Huang and Fan Jia and Yanhui Wang and Chi Zhang and Tiancai Wang and Xiaoyan Sun and Xiangyu Zhang},
      year={2024},
      eprint={2408.07605},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.07605}, 
}
}

Contact

Feel free to contact us at wenyuqing AT mail.ustc.edu.cn or wangtiancai AT megvii.com

Acknowledgement

This code builds on Stability-AI, ControlNet and StreamPETR. Thanks for open-sourcing!