[shardformer] support gradient accumulation for hybrid parallel plugin

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

https://www.colossalai.org

Apache License 2.0

38.31k stars 4.3k forks source link

[shardformer] support gradient accumulation for hybrid parallel plugin #4870

Open Fridge003 opened 9 months ago

Fridge003 commented 9 months ago

support gradient accumulation for hybrid parallel plugin (through implementing no_sync method for plugin)

relevant issue: #4776

ShinoharaHare commented 6 months ago

Hi, any updates? I need this feature so bad.

ShinoharaHare commented 6 months ago

Or is it possible to enable it on HybridParallelPlugin in a torch-like way (described in the document)? However, unlike GeminiPlugin, it seems there is no enable_gradient_accumulation for HybridParallelPlugin. It's confusing.

flybird11111 commented 6 months ago

Or is it possible to enable it on HybridParallelPlugin in a torch-like way (described in the document)? However, unlike GeminiPlugin, it seems there is no enable_gradient_accumulation for HybridParallelPlugin. It's confusing.

Hi, we will implement this feature as soon as possible.

flybird11111 commented 6 months ago

Or is it possible to enable it on HybridParallelPlugin in a torch-like way (described in the document)? However, unlike GeminiPlugin, it seems there is no enable_gradient_accumulation for HybridParallelPlugin. It's confusing. Hi, https://colossalai.org/docs/features/gradient_accumulation_with_booster, You can use the gradient accumulation of the HybridParallelPlugin in the following way.

cwszz commented 6 months ago

@flybird11111 Hi, I didn't find the enable_gradient_accumulation and no_sync() in HybridParallelPlugin https://github.com/hpcaitech/ColossalAI/blob/main/colossalai/booster/plugin/hybrid_parallel_plugin.py. So I wonder how to add gradient accumulation in HybridParallelPlugin following https://colossalai.org/docs/features/gradient_accumulation_with_booster. Can you provide more details?