jiaxilv / GPT4Motion

https://gpt4motion.github.io/
114 stars 5 forks source link

:basketball: GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

https://github.com/jiaxilv/GPT4Motion/assets/67190845/f4309a24-fb8d-4800-8709-5794adf338e3

We introduce GPT4Motion, a training-free framework that leverages the planning capability of large language models such as GPT, the physical simulation strength of Blender, and the excellent image generation ability of text-to-image diffusion models to enhance the quality of video synthesis.

Paper | Project Website

Jiaxi Lv*, Yi Huang*, Mingfu Yan*, Jiancheng Huang, Jianzhuang Liu, Yifan Liu, Yafei Wen, Xiaoxin Chen, Shifeng Chen

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, VIVO AI Lab

News!!! :fire::fire::fire:

Overview

image

First, the user prompt is inserted into our designed prompt template. Then, the Python script generated by GPT-4 drives the Blender physics engine to simulate the corresponding motion, producing sequences of edge maps and depth maps. Finally, two ControlNets are employed to constrain the physical motion of video frames generated by Stable Diffusion, where a temporal consistency constraint is designed to enforce the coherence among frames.

Performance

Comparisons with Baselines

https://github.com/jiaxilv/GPT4Motion/assets/67190845/83ce746d-f2e0-4e40-bc0d-cd572b6b867e

Comparison of the video results generated by different text-to-video models with the prompt "A white flag flaps in the wind".

https://github.com/jiaxilv/GPT4Motion/assets/67190845/9780d680-888f-4fe9-8371-cbc0a1c69801

Comparison of the video results generated by different text-to-video models with the prompt "Water flows into a white mug on a table, top-down view.

Controlling Physical Properties

https://github.com/jiaxilv/GPT4Motion/assets/67190845/0724e560-606c-409e-b6ec-038141a6055c

GPT4Motion's results on basketball drop and collision.

https://github.com/jiaxilv/GPT4Motion/assets/67190845/71513144-57ba-4743-807c-6f7ebd9c9eb2

GPT4Motion's results on "A white flag flags in light or the or strong wind".

https://github.com/jiaxilv/GPT4Motion/assets/67190845/b3f2e66c-65b4-4a2f-b656-dd163e28c5b7

GPT4Motion's results on "A white T-shirt flutters in light or the or strong wind".

https://github.com/jiaxilv/GPT4Motion/assets/67190845/753a7a15-28a2-40cf-818e-dd96e29713c7

GPT4Motion's results on "Water or Viscous or Very viscous flows into a white mug on a table, top-down view".

Directory of our code

For ease of reading, we list our directory structure.

├── data
│   └── basketball
│       └── A basketball free falls in the air
│           ├── depth
│           │   ├── depth_0000.png
│           │   └── ... (more depth images)
│           └── freestyle
│               ├── canny_0000.png
│               └── ... (more canny images)
├── PhysicsGeneration
│   ├── BlenderTool
│   │   ├── assets
│   │   │   └── basketball.obj
│   │   ├── __init__.py
│   │   └── utils.py
│   ├── prompt_for_GPT4.txt
│   └── script.py
├── README.MD
└── VideoGeneration
    ├── config
    │   └── basketball.yaml
    ├── main.py
    ├── requirements.txt
    └── utils
        ├── Cross_Frame_Attention.py
        ├── __init__.py
        └── utils_all.py

Reproducing Our Work

Generation of Motion Edge Maps and Depth Maps with GPT-4 and Blender

Please install Blender 3.6 according to https://www.blender.org/download/.

cd PhysicsGeneration

Please copy the prompt from "prompt_for_GPT4.txt" to GPT-4 and add the following prefix to the Python code generated by GPT-4:

import bpy
import os
import math
import random
from random import uniform
import mathutils

from sys import path
path.append(bpy.path.abspath("./"))
from BlenderTool.utils import *

ASSETS_PATH = 'BlenderTool/assets/'

You will get a Python file like "script.py", please use the following commands to generate the edge and depth maps:

blender -b -P script.py

The generated edge maps and depth maps are saved are saved in "../data/new/" folder.

Video Generation

Please move to the "VideoGeneration" folder and install the corresponding environment:

cd ../VideoGeneration/
conda create -n GPT4Motion python=3.9
conda activate GPT4Motion
pip install -r requirements.txt

You can generate videos based on our pre-existing depth and edge maps by following the instructions below:

python main.py config/basketball.yaml

The generated results are shown below:

https://github.com/jiaxilv/GPT4Motion/assets/67190845/cb0b2b9d-b2ec-44b1-a6cd-4076bfaede8d