yuanxion / Text2Video-Zero

Text-to-Image Diffusion Models are Zero-Shot Video Generators

Other

2 stars 1 forks source link

[Inference] Reduce GPU memory usage for model #2

Open yuanxion opened 1 year ago

yuanxion commented 1 year ago

Colossal-AI may be used to reduce GPU Memory consumption during training (but may cause more training time). https://github.com/hpcaitech/ColossalAI

Colossal-AI is used for acceleration of AIGC (AI-Generated Content) models such as Stable Diffusion v1 and Stable Diffusion v2. (Nvidia GPU/Habana Gaudi)

Xiangyi1996 commented 1 year ago

For inference,

Colossal-AI achieves up to 1.42 times faster for single-GPU inference
Reduce inference GPU memory consumption by 2.5x.

yuanxion commented 1 year ago

For inference,

Colossal-AI achieves up to 1.42 times faster for single-GPU inference

Reduce inference GPU memory consumption by 2.5x.

Wow, that's great. Could you please create a PR for it, thx.

yuanxion commented 1 year ago

And maybe also collect the performance (baseline vs colossal-ai) data within a table, something like:

Performance	baseline	colossal-ai	deepspeed
Running time (s)
GPU utilization (%)
Memory usage (MB)

Also, there is a config.py for this project, which may also helpful for saving GPU memory.

save_memory = False
// share.py
if config.save_memory:
    enable_sliced_attention()

Xiangyi1996 commented 1 year ago

ColossalAI is a distributed deep learning framework that mainly targets the training and inference of large-scale models. It provides various parallelization techniques such as data parallelism, tensor parallelism and pipeline parallelism, as well as some optimization tools such as mixed precision training, gradient accumulation and offloading. For single-machine single-card models, ColossalAI may not have a significant effect, because these models do not require distributed systems or multi-card parallelism. If you want to use ColossalAI to accelerate single-machine single-card models, you can try to use its optimization tools, such as mixed precision training or gradient accumulation, to improve the training speed or increase the batch size. However, these tools can also be used in other frameworks, such as PyTorch or TensorFlow. Therefore, ColossalAI’s main advantage is on large-scale models, not on single-machine single-card models.

Xiangyi1996 commented 1 year ago

Introduction about Colossal AI

Background

Parallelism

Data Parallel

ModelParallel

Tensor Parallel
Pipeline Parallel
Sequence Parallelism

Ring Self-attention

Performance

How to optimize the single-machine single-card model?

For single-machine single-card models, ColossalAI may not have a significant effect, because these models do not require distributed systems or multi-card parallelism. If you want to use ColossalAI to accelerate single-machine single-card models, you can try to use its optimization tools, such as mixed precision training or gradient accumulation, to improve the training speed or increase the batch size. However, these tools can also be used in other frameworks, such as PyTorch or TensorFlow. Therefore, ColossalAI’s main advantage is on large-scale models, not on single-machine single-card models.

Xiangyi1996 commented 1 year ago

Performance	baseline	colossal-ai	deepspeed
Running time (s)	7 mins
GPU utilization (%)	99%
Memory usage (MB)	9228MiB / 12288MiB

yuanxion / Text2Video-Zero

[Inference] Reduce GPU memory usage for model #2

Introduction about Colossal AI

Background

Parallelism

Data Parallel

ModelParallel

Sequence Parallelism

Ring Self-attention

Performance

How to optimize the single-machine single-card model?