mira-space / Mira

GNU General Public License v3.0
316 stars 9 forks source link

Mira: A Mini-step Towards Sora-like Long Video Generation

Zhaoyang Zhang1, Ziyang Yuan1, Xuan Ju1, Yiming Gao1, Xintao Wang1#, Chun Yuan, Ying Shan1,
1ARC Lab, Tencent PCG *Equal contribution #Project lead

Project Page MiraData Page arXiv Data Link

We introduce Mira (Mini-Sora), an initial foray into the realm of high-quality, long-duration video generation in the style of Sora. Mira stands out from existing text-to-video (T2V) generation frameworks in several key ways:

Please acknowledge that our work on Mira is in the experimental phase. There are several areas where Sora still significantly outperforms Mira and other open-source T2V frameworks, including:

The Mira project is our endeavor to investigate and refine the entire data-model-training pipeline for Sora-like, lightweight T2V frameworks, and to preliminarily demonstrate the aforementioned Sora characteristics. Our goal is to foster innovation and democratize the field of content creation, paving the way for more accessible and advanced video generation tools.

Results

5s 768x480

https://github.com/mira-space/Mira/assets/163223899/5e7d74d3-82a4-4a94-bfc1-9140b7929c50

10s 384×240

https://github.com/mira-space/Mira/assets/13939478/4de6aade-4eca-4291-bcc6-950c7b44c981

Each individual video can be downloaded from here.

20s 128×80

https://github.com/mira-space/Mira/assets/13939478/9f274503-9715-4d2a-a262-10113c4df78f

📰 Updates

Stay tuned! We are actively working on this project. Expect a steady stream of updates as we expand our dataset, enhance our annotation processes, and refine our model checkpoints. Keep an eye out for these upcoming updates, as we continue to make strides in our project's development.

[2024.07.11] 🔥 We're glad to announce the release of Mira-v1 and MiraData-v1! The full version of the MiraData-v1 datasets is now available, along with the corresponding technical report](https://arxiv.org/abs/2407.06358v1). Additionally, we have updated the MiraDiT model to improve quality, now supporting resolutions up to 768x480 and durations up to 10 seconds using the updated data. This version also includes an optional post-processing feature for video interpolation and enhancement, leveraging the RIFE](https://github.com/hzwer/ECCV2022-RIFE) framework.

[2024.04.01] 🔥 We're delighted to announce the release of Mira and MiraData-v0. This release offers a comprehensive open-source suite for data annotation and training pipelines, specifically tailored for the creation of long-duration videos with dynamic content and consistent quality. Our provided codes and checkpoints empower users to generate videos up to 20 seconds in 128x80 resolution and 10 seconds in 384x240 resolution. Dive into the future of video generation with Mira!

Installation

## create a conda enviroment
conda update -n base -c defaults conda 
conda create -y -n mira python=3.8.5 
source activate mira 

## install dependencies
pip install torch==2.0 torchvision torchaudio decord==0.6.0  \
einops==0.3.0  imageio==2.9.0 \
numpy omegaconf==2.1.1 opencv_python pandas \
Pillow==9.5.0 pytorch_lightning==1.9.0 PyYAML==6.0 setuptools==65.6.3  \
torchvision tqdm==4.65.0 transformers==4.25.1 moviepy av  tensorboardx \
&& pip install  timm scikit-learn  open_clip_torch==2.22.0 kornia simplejson easydict pynvml rotary_embedding_torch==0.3.1 triton  cached_property  \
&& pip install xformers==0.0.18 \
&& pip install taming-transformers fairscale deepspeed  diffusers

Training

Checkpoints

Name Model Size Data Resolution
128-v0.pt 1.1B Webvid(pretrain) + MiraData-v0 128x80, 120 frames
384-v0.pt 1.1B Webvid(pretrain) + MiraData-v0 384x240, 60 frames
384-v1-10s.pt 1.1B Webvid(pretrain) + MiraData-v1 384x240, 60 frames
384-v1-10s.pt 1.1B Webvid(pretrain) + MiraData-v1 384x240, 120 frames
768-v1-5s.pt 1.1B Webvid(pretrain) + MiraData-v1 768x480, 30 frames
768-v1-10s.pt (Coming Soon) 1.1B Webvid(pretrain) + MiraData-v1 768x480, 60 frames

Please download the above checkponits in our huggingface page (Mira-V0) and huggingface page (Mira-V0).

Finetuning the Mira-v0 model on 768x480 resolution.

## activate envrionment
conda activate mira

## Run training
bash configs/Mira/run_768v1_mira.sh 0

Finetuning the Mira-v0 model on 384x240 resolution.

## activate envrionment
conda activate mira

## Run training
bash configs/Mira/run_384v1_mira.sh 0

Inference

Evaluate the Mira-v1 model on 768x480 resolution.

## activate envrionment
conda activate mira

## Run inference
bash configs/inference/run_text2video_768.sh

Evaluate the Mira-v1 model on 384x240 resolution.

## activate envrionment
conda activate mira

## Run inference
bash configs/inference/run_text2video_384.sh

Licence

Mira is under the GPL-v3 Licence and is supported for commercial usage. If you need a commercial license for YOLO-World, please feel free to contact us.