sczhou / Upscale-A-Video

[CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
Other
1.02k stars 54 forks source link
aigc-enhancement deflicker video-diffusion-model video-super-resolution

Upscale-A-Video:
Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

Shangchen Zhouβˆ—Peiqing Yangβˆ—Jianyi WangYihang LuoChen Change Loy
S-Lab, Nanyang Technological University
CVPR 2024 (Highlight)

Upscale-A-Video is a diffusion-based model that upscales videos by taking the low-resolution video and text prompts as inputs.
:open_book: For more visual results, go checkout our project page ---

πŸ”₯ Update

🎬 Overview

overall_structure

πŸ”§ Dependencies and Installation

  1. Clone Repo

    git clone https://github.com/sczhou/Upscale-A-Video.git
    cd Upscale-A-Video
  2. Create Conda Environment and Install Dependencies

    # create new conda env
    conda create -n UAV python=3.9 -y
    conda activate UAV
    
    # install python dependencies
    pip install -r requirements.txt
  3. Download Models

    (a) Download pretrained models and configs from Google Drive and put them under the pretrained_models/upscale_a_video folder.

    The pretrained_models directory structure should be arranged as:

    β”œβ”€β”€ pretrained_models
    β”‚   β”œβ”€β”€ upscale_a_video
    β”‚   β”‚   β”œβ”€β”€ low_res_scheduler
    β”‚   β”‚       β”œβ”€β”€ ...
    β”‚   β”‚   β”œβ”€β”€ propagator
    β”‚   β”‚       β”œβ”€β”€ ...
    β”‚   β”‚   β”œβ”€β”€ scheduler
    β”‚   β”‚       β”œβ”€β”€ ...
    β”‚   β”‚   β”œβ”€β”€ text_encoder
    β”‚   β”‚       β”œβ”€β”€ ...
    β”‚   β”‚   β”œβ”€β”€ tokenizer
    β”‚   β”‚       β”œβ”€β”€ ...
    β”‚   β”‚   β”œβ”€β”€ unet
    β”‚   β”‚       β”œβ”€β”€ ...
    β”‚   β”‚   β”œβ”€β”€ vae
    β”‚   β”‚       β”œβ”€β”€ ...

    (a) (Optional) LLaVA can be downloaded automatically when set --use_llava to True, for users with access to huggingface.

β˜•οΈ Quick Inference

The --input_path can be either the path to a single video or a folder containing multiple videos.

We provide several examples in the inputs folder. Run the following commands to try it out:

## AIGC videos
python inference_upscale_a_video.py \
-i ./inputs/aigc_1.mp4 -o ./results -n 150 -g 6 -s 30 -p 24,26,28

python inference_upscale_a_video.py \
-i ./inputs/aigc_2.mp4 -o ./results -n 150 -g 6 -s 30 -p 24,26,28

python inference_upscale_a_video.py \
-i ./inputs/aigc_3.mp4 -o ./results -n 150 -g 6 -s 30 -p 20,22,24
## old videos/movies/animations 
python inference_upscale_a_video.py \
-i ./inputs/old_video_1.mp4 -o ./results -n 150 -g 9 -s 30

python inference_upscale_a_video.py \
-i ./inputs/old_movie_1.mp4 -o ./results -n 100 -g 5 -s 20 -p 17,18,19

python inference_upscale_a_video.py \
-i ./inputs/old_movie_2.mp4 -o ./results -n 120 -g 6 -s 30 -p 8,10,12

python inference_upscale_a_video.py \
-i ./inputs/old_animation_1.mp4 -o ./results -n 120 -g 6 -s 20 --use_video_vae

If you notice any color discrepancies between the output and the input, you can set --color_fix to "AdaIn" or "Wavelet". By default, it is set to "None".

🎞️ YouHQ Dataset

The datasets are hosted on Google Drive

Dataset Link Description
YouHQ-Train Google Drive 38,576 videos for training, each of which has around 32 frames.
YouHQ40-Test Google Drive 40 video clips for evaluation, each of which has around 32 frames.

πŸ“‘ Citation

If you find our repo useful for your research, please consider citing our paper:

   @inproceedings{zhou2024upscaleavideo,
      title={{Upscale-A-Video}: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution},
      author={Zhou, Shangchen and Yang, Peiqing and Wang, Jianyi and Luo, Yihang and Loy, Chen Change},
      booktitle={CVPR},
      year={2024}
   }

πŸ“ License

This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.

πŸ“§ Contact

If you have any questions, please feel free to reach us at shangchenzhou@gmail.com or peiqingyang99@outlook.com.