Open nijkah opened 2 years ago
Hi, @nijkah We welcome any kind of contribution, and deepspeed integration is definitely what we desire!
However, could you make it clearer about "deepspeed-specified runner and optimwrapper"? If you are going to write a new runner
that only serves deepspeed models, it seems not quite reasonable and we might need more discussion on it ^^
Hi, @nijkah Have you got any new progress on deepspeed integration? Hope we can discuss on it before you post a PR because it might not be a small & easy one. If you have any ideas/problems/progress, we are always open to have a discussion, either in this issue, or our discussion board.
Hi, @C1rN09. Our integration development is almost done although there are still several choices left to consider.
Our current implementation supports
doesn't support yet
MM
models.There are several reasons why we try to write a new deepspeed-dedicated runner.
Although we try to follow most of mmengine's Runner
logic, there should be some modifications to support deepspeed.
Main logic of DeepSpeedRunner
is like below,
>>> self.model = self.build_model(model)
>>> self.optim_wrapper = self.build_optim_wrapper(optim_wrapper)
>>> ds_config = json.load(open(cfg.deepspeed_config))
>>> self.model, optimizer = deepspeed.initialize(
>>> model=self.model,
>>> optimizer=self.optim_wrapper.optimizer,
>>> model_parameters=self.model.parameters(),
>>> config=ds_config)
>>> self.optim_wrapper.optimizer = optimizer
>>> self.inject_base_model_methods()
First, the order of logic should be changed when using deepspeed
. There was a similar modification in your FSDP PR. It may be ignored in the future.
And, to use deepspeed
, it seems better to use DeepSpeedEngine
's inner logic for optimizers. Then we should give the optimizer
variable to deepspeed.initialize
or DeepSpeedEngine
.
Moreover, DeepSpeedEngine
requires users to update parameters by engine.step()
which includes optimizer.step
and related logic. It made us write a new class for DeepSpeedOptimWrapper
.
I think it is better to share our prototype code when we are ready instead of explaining by writing. We can share the link of our repo containing the code before posting the PR.
Describe the feature
Motivation Nowadays, deepspeed became a fundamental framework that facilitates training and inference for large-scale or foundation models. We are developing a feature for deepspeed integration into mmengine with support for a deepspeed-specified runner and optim_wrapper.
Does MMEngine have a plan to support deepspeed? Then we can contribute to MMEngine with our implementation :)
Please let me know any guide, plan or opinion about this. :)