Open yuanxion opened 1 year ago
For inference,
For inference,
- Colossal-AI achieves up to 1.42 times faster for single-GPU inference
- Reduce inference GPU memory consumption by 2.5x.
Wow, that's great. Could you please create a PR for it, thx.
And maybe also collect the performance (baseline vs colossal-ai) data within a table, something like:
Performance | baseline | colossal-ai | deepspeed |
---|---|---|---|
Running time (s) | |||
GPU utilization (%) | |||
Memory usage (MB) |
Also, there is a config.py
for this project, which may also helpful for saving GPU memory.
save_memory = False
// share.py
if config.save_memory:
enable_sliced_attention()
ColossalAI is a distributed deep learning framework that mainly targets the training and inference of large-scale models. It provides various parallelization techniques such as data parallelism, tensor parallelism and pipeline parallelism, as well as some optimization tools such as mixed precision training, gradient accumulation and offloading. For single-machine single-card models, ColossalAI may not have a significant effect, because these models do not require distributed systems or multi-card parallelism. If you want to use ColossalAI to accelerate single-machine single-card models, you can try to use its optimization tools, such as mixed precision training or gradient accumulation, to improve the training speed or increase the batch size. However, these tools can also be used in other frameworks, such as PyTorch or TensorFlow. Therefore, ColossalAI’s main advantage is on large-scale models, not on single-machine single-card models.
For single-machine single-card models, ColossalAI may not have a significant effect, because these models do not require distributed systems or multi-card parallelism. If you want to use ColossalAI to accelerate single-machine single-card models, you can try to use its optimization tools, such as mixed precision training or gradient accumulation, to improve the training speed or increase the batch size. However, these tools can also be used in other frameworks, such as PyTorch or TensorFlow. Therefore, ColossalAI’s main advantage is on large-scale models, not on single-machine single-card models.
Performance | baseline | colossal-ai | deepspeed |
---|---|---|---|
Running time (s) | 7 mins | ||
GPU utilization (%) | 99% | ||
Memory usage (MB) | 9228MiB / 12288MiB |
Colossal-AI may be used to reduce GPU Memory consumption during training (but may cause more training time). https://github.com/hpcaitech/ColossalAI
Colossal-AI is used for acceleration of AIGC (AI-Generated Content) models such as Stable Diffusion v1 and Stable Diffusion v2. (Nvidia GPU/Habana Gaudi)