microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
34.51k stars 4.03k forks source link

[REQUEST]Support for multiple node inference? #4704

Open sleepwalker2017 opened 9 months ago

sleepwalker2017 commented 9 months ago

Hi, I want to run one LLM model using multiple machines.

On one node, I want to use tensor parallel to speedup.

Within multiple nodes, I want to use pipeline parallel.

Is this supported? If so, is there any documentation? I didn't find it.

If not, any documentation for inference using multiple nodes?

tjruwase commented 9 months ago

@sleepwalker2017, yes DeepSpeed supports combinations of tensor and pipeline parallelism, a.k.a., 3D parallelism. However, these techniques are not natively provided by DeepSpeed, instead you need to implement them in your model. You can explore example implementations from deepspeed and bigscience.

sleepwalker2017 commented 9 months ago

@sleepwalker2017, yes DeepSpeed supports combinations of tensor and pipeline parallelism, a.k.a., 3D parallelism. However, these techniques are not natively provided by DeepSpeed, instead you need to implement them in your model. You can explore example implementations from deepspeed and bigscience.

Thank you, seems this needs lots of modifications to the model. Does that mean we need to modify the model code to add some communication primitives?

Is the tensor parallelism between nodes natively supported in deepspeed? Any example for this feature?

tjruwase commented 9 months ago

Sorry, I just noticed you are asking for inference, not training. For inference, DeepSpeed provides native support for tensor parallelism. Please see the following:

  1. https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen
  2. https://www.deepspeed.ai/2022/10/10/mii.html
  3. https://www.deepspeed.ai/2021/03/15/inference-kernel-optimization.html
tjruwase commented 9 months ago

Is the tensor parallelism between nodes natively supported in deepspeed? Any example for this feature?

Yes, this should natively work for inference, but performance will be poor because of cross-node communication.

sleepwalker2017 commented 9 months ago

Is the tensor parallelism between nodes natively supported in deepspeed? Any example for this feature?

Yes, this should natively work for inference, but performance will be poor because of cross-node communication.

For inference, the pipeline parallelism is not supported ? Is that the fact?

And also, I have tried out deepspeed for local inference on a single machine. The usage is simple.

But I didn't find some documentation for deploying model across machines. Any docs for that?

Thank you.

tjruwase commented 9 months ago

Please ensure your environment is setup for multi-node execution and try the same steps for single machine inference.

sleepwalker2017 commented 9 months ago

Please ensure your environment is setup for multi-node execution and try the same steps for single machine inference.

That's exactly what I need, Thank you!

sleepwalker2017 commented 9 months ago

Please ensure your environment is setup for multi-node execution and try the same steps for single machine inference.

Hi, does deepspeed support PP for inference?

Sri-Vadlamani commented 7 months ago

Hi @sleepwalker2017 was wondering if you were able to accomplish model inference using deepspeed with multinode. I am working on similar problem but couldn't find any proper documentation. Thank you