ECCV, 2024
Shaowei Liu
ยท
Zhongzheng Ren
ยท
Saurabh Gupta*
ยท
Shenlong Wang*
ยท
This repository contains the pytorch implementation for the paper PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation, ECCV 2024. In this paper, we present a novel training-free image-to-video generation pipeline integrates physical simulation and generative video diffusion prior.
git clone --recurse-submodules https://github.com/stevenlsw/physgen.git
cd physgen
conda create -n physgen python=3.9
conda activate physgen
pip install -r requirements.txt
Run our Colab notebook for quick start!
export PYTHONPATH=$(pwd)
name="pool"
python simulation/animate.py --data_root data --save_root outputs --config data/${name}/sim.yaml
The output video should be saved in outputs/${name}/composite.mp4
. Try set name
to be domino
, balls
, pig_ball
and car
for other scenes exploration. The example outputs are shown below:
Input Image | Simulation | Output Video |
---|---|---|
Input | Segmentation | Normal | Albedo | Shading | Inpainting |
---|---|---|---|---|---|
Simulation requires the following input for each image:
image folder/
โโโ original.png
โโโ mask.png # segmentation mask
โโโ inpaint.png # background inpainting
โโโ sim.yaml # simulation configuration file
sim.yaml
specify the physical properties of each object and initial conditions (force and speed on each object). Please see data/pig_ball/sim.yaml
for an example. Set display
to true
to visualize the simulation process with display device, set save_snapshot
to true
to save the simulation snapshots.
Run the simulation by the following command:
cd simulation
python animate.py --data_root ../data --save_root ../outputs --config ../data/${name}/sim.yaml
The outputs are saved in outputs/${name}
as follows:
output folder/
โโโ history.pkl # simulation history
โโโ composite.mp4 # composite video
|โโ composite.pt # composite video tensor
โโโ mask_video.pt # foreground masked video tensor
โโโ trans_list.pt # objects transformation list tensor
Relighting requires the following input:
image folder/ #
โโโ normal.npy # normal map
โโโ shading.npy # shading map by intrinsic decomposition
previous output folder/
โโโ composite.pt # composite video
โโโ mask_video.pt # foreground masked video tensor
โโโ trans_list.pt # objects transformation list tensor
perception_input
is the image folder contains the perception result. The previous_output
is the output folder from the previous simulation step.cd relight
python relight.py --perception_input ../data/${name} --previous_output ../outputs/${name}
relight.mp4
and relight.pt
is the relighted video and tensor. Compare between composite video and relighted video: | Input Image | Composite Video | Relight Video |
---|---|---|---|
Download the SEINE model follow instruction
# install git-lfs beforehand
mkdir -p diffusion/SEINE/pretrained
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 diffusion/SEINE/pretrained/stable-diffusion-v1-4
wget -P diffusion/SEINE/pretrained https://huggingface.co/Vchitect/SEINE/resolve/main/seine.pt
The video diffusion rendering requires the following input:
image folder/ #
โโโ original.png # input image
โโโ sim.yaml # simulation configuration file (optional)
previous output folder/
โโโ relight.pt # composite video
โโโ mask_video.pt # foreground masked video tensor
Run the video diffusion rendering by the following command:
cd diffusion
python video_diffusion.py --perception_input ../data/${name} --previous_output ../outputs/${name}
denoise_strength
and prompt
could be adjusted in the above script. denoise_strength
controls the amount of noise added, 0 means no denoising, 1 means denoise from scratch with lots of variance to the input image. prompt
is the input prompt for video diffusion model, we use default foreground object names from perception model as prompt.
The output final_video.mp4
is the rendered video.
Compare between relight video and diffuson rendered video: | Input Image | Relight Video | Final Video |
---|---|---|---|
We integrate the simulation, relighting and video diffusion rendering in one script. Please follow the Video Diffusion Rendering to download the SEINE model first.
bash scripts/run_demo.sh ${name}
If you find our work useful in your research, please cite:
@inproceedings{liu2024physgen,
title={PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation},
author={Liu, Shaowei and Ren, Zhongzheng and Gupta, Saurabh and Wang, Shenlong},
booktitle={European Conference on Computer Vision ECCV},
year={2024}
}