microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
34.52k stars 4.03k forks source link

[REQUEST] How to do 3D Parallelism for HuggingFace Models? #3826

Open jacklanda opened 1 year ago

jacklanda commented 1 year ago

Overview

Does DeepSpeed leverage the 3D parallelism (i.e. data parallelism + pipeline parallelism + tensor parallelism) for huggingface models (e.g., GPT-J, LLaMA) fine-tuning? May I ask anybody know how to simply implement this using DeepSpeed? Thanks!

janelu9 commented 1 year ago

make them pipeline, last layer's output is next one's input

jacklanda commented 1 year ago

make them pipeline, last layer's output is next one's input

Thanks for your reply, however, this method is pipline-only parallelism, not 3D :(

janelu9 commented 1 year ago

make them pipeline, last layer's output is next one's input

Thanks for your reply, however, this method is pipline-only parallelism, not 3D :(

Yes, I have the same doubt https://github.com/microsoft/DeepSpeed/issues/3888