mosaicml / composer

Supercharge Your Model Training
http://docs.mosaicml.com
Apache License 2.0
5.14k stars 416 forks source link

Trainer.fit is missing the `optimizers` parameter #2503

Open antoinebrl opened 1 year ago

antoinebrl commented 1 year ago

Hello, The .fit() method of the Trainer is missing the optimizers parameters although it is part of its documentation.

It would be nice to keep the optimizers and the schedulers together. As the schedulers depend on the dataloaders I would argue the .fit() method should take the optimizers too as input.

mvpatel2000 commented 1 year ago

This is a good suggestion! We'd love to get a community PR on this issue :) otherwise, we will add to our roadmap though unfortunately it might not be done in the immediate future

wlrd commented 1 year ago

@antoinebrl @mvpatel2000 once we pass the optimizers into the .fit() method, what do we want to do with them? I am happy to take a look and help out on this if I can get some guidance on how I use the optimizers here

mvpatel2000 commented 1 year ago

@wlrd Sure! In short, you'll need to duplicate the setup we do in def __init__ for setting up optimizers starting here:

https://github.com/mosaicml/composer/blob/dev/composer/trainer/trainer.py#L988

I believe this includes just updating State to use the new optimizer.

The next hard step is adding support for DeepSpeed and FSDP. You can see the DeepSpeed step here: https://github.com/mosaicml/composer/blob/dev/composer/trainer/trainer.py#L1004

For FSDP, it becomes much more complicated.... I think you can maybe just recreate the optimizer and repoint it to the model. I'm happy to scope this part out in more detail later.

I would recommend a series of 3 PRs:

  1. Add optimizers for no parallelism and DDP case (just update State?) along with a unit test verifying it works. Raise a ValueError if DeepSpeed or FSPD are enabled and say it's not supported yet
  2. Add DeepSpeed support
  3. Add FSDP support