Lightning AI is excited to announce the release of Lightning 2.1 :zap: It's the culmination of work from 79 contributors who have worked on features, bug-fixes, and documentation for a total of over 750+ commits since v2.0.
The theme of 2.1 is "bigger, better, faster": Bigger because training large multi-billion parameter models has gotten even more efficient thanks to FSDP, efficient initialization and sharded checkpointing improvements, better because it's easier than ever to scale models without making substantial code changes or installing third-party packages and faster because it leverages the latest hardware features to speed up training in low-bit precision thanks to new precision plugins like bitsandbytes and transformer engine.
And of course, as the name implies, this release fully leverages the latest features in PyTorch 2.1 :tada:
The FSDP strategy for training large billion-parameter models gets substantial improvements and new features in Lightning 2.1, both in Trainer and Fabric (in case you didn't know, Fabric is the latest addition to the Lightning family of tools to scale models without the boilerplate code).
FSDP is now more user-friendly to configure, has memory management and speed improvements, and we have a brand new end-to-end user guide with best practices (Trainer, Fabric).
Efficient Saving and Loading of Large Checkpoints
When training large billion-parameter models with FSDP, saving and resuming training, or even just loading model parameters for finetuning can be challenging, as users are are often plagued by out-of-memory errors and speed bottlenecks.
In 2.1, we made several improvements. Starting with saving checkpoints, we added support for distributed/sharded checkpoints, enabled through the setting state_dict_type in the strategy (#18364, #18358):
Trainer:
import lightning as L
from lightning.pytorch.strategies import FSDPStrategy
Default used by the strategy
strategy = FSDPStrategy(state_dict_type="full")
Enable saving distributed checkpoints
</tr></table>
... (truncated)
Commits
6f6c07d Revert removal of empty-parameters check for configure_optimizers() with FS...
20ce3ae docs: setting cron for periodical update tutorials (#18783)
9f17324 Update probot-check-group.yml to v5.4 (#18782)
c5e3c45 Save ModelCheckpoint's last.ckpt as symlink if possible (#18748)
7434c47 Raise an exception when calling fit twice with spawn (#18776)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Bumps lightning from 2.0.2 to 2.1.0.
Release notes
Sourced from lightning's releases.
... (truncated)
Commits
6f6c07d
Revert removal of empty-parameters check forconfigure_optimizers()
with FS...20ce3ae
docs: setting cron for periodical update tutorials (#18783)9f17324
Update probot-check-group.yml to v5.4 (#18782)c5e3c45
Save ModelCheckpoint'slast.ckpt
as symlink if possible (#18748)7434c47
Raise an exception when callingfit
twice with spawn (#18776)5a83f54
Minor strategy fixes [TPU] (#18774)4df6e13
Update version and changelog (#18767)83abe5e
Bugfix: Pinlightning-cloud
version (#18778)27ad9e9
xfail collective tests (#18779)c39f680
Fix deletion of resumed checkpoints (#18750)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show