[REQUEST]Inference Optimized Pipeline Parallelism

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

https://www.deepspeed.ai/

Apache License 2.0

35.45k stars 4.12k forks source link

[REQUEST]Inference Optimized Pipeline Parallelism #3935

Open champson opened 1 year ago

champson commented 1 year ago

As mentioned in the paper https://arxiv.org/abs/2207.00032, DeepSpeed inference supports pipeline parallelism, including hybrid scheduling, offloading activations, and communication, which have led to significant performance improvements. However, does DeepSpeed currently support these features? If not, is there a timeline for when they will be supported?

yefanhust commented 1 year ago

I'm also very interested in this issue. It would be great to get a clear response from the community on this matter. Thanks!

samyam commented 1 year ago

@champson, @yefanhust are there specific models/scenarios you are looking to apply pipeline parallelism for. The scenarios that PP is helpful for inference is very narrow, and applicable in just a handful of cases currently, so we have de-prioritized releasing these features. But we can re-visit this if there is a strong interest in the community for these features.

Noblezhong commented 1 month ago

Hi！I am also interesting in this feature described in this paper, is there any demo or tutorial for 'hybrid pipeline inference schedule'?