Bump lightning from 2.0.2 to 2.1.0

Bumps lightning from 2.0.2 to 2.1.0.

Release notes

Lightning 2.1: Train Bigger, Better, Faster

Lightning AI is excited to announce the release of Lightning 2.1 :zap: It's the culmination of work from 79 contributors who have worked on features, bug-fixes, and documentation for a total of over 750+ commits since v2.0.

The theme of 2.1 is "bigger, better, faster": Bigger because training large multi-billion parameter models has gotten even more efficient thanks to FSDP, efficient initialization and sharded checkpointing improvements, better because it's easier than ever to scale models without making substantial code changes or installing third-party packages and faster because it leverages the latest hardware features to speed up training in low-bit precision thanks to new precision plugins like bitsandbytes and transformer engine. And of course, as the name implies, this release fully leverages the latest features in PyTorch 2.1 :tada:

Highlights

Improvements To Large-Scale Training With FSDP

True Half-Precision

Bitsandbytes Quantization

Transformer Engine

Lightning on TPU Goes Brrr

Granular Control Over Checkpoints in Fabric

Backward Incompatible Changes

PyTorch Lightning

Lightning Fabric

Full Changelog

PyTorch Lightning

Lightning Fabric

Lightning App

Contributors

Highlights

Improvements To Large-Scale Training With FSDP

The FSDP strategy for training large billion-parameter models gets substantial improvements and new features in Lightning 2.1, both in Trainer and Fabric (in case you didn't know, Fabric is the latest addition to the Lightning family of tools to scale models without the boilerplate code). FSDP is now more user-friendly to configure, has memory management and speed improvements, and we have a brand new end-to-end user guide with best practices (Trainer, Fabric).

Efficient Saving and Loading of Large Checkpoints

When training large billion-parameter models with FSDP, saving and resuming training, or even just loading model parameters for finetuning can be challenging, as users are are often plagued by out-of-memory errors and speed bottlenecks.

In 2.1, we made several improvements. Starting with saving checkpoints, we added support for distributed/sharded checkpoints, enabled through the setting state_dict_type in the strategy (#18364, #18358):

Trainer:
import lightning as L
from lightning.pytorch.strategies import FSDPStrategy
Default used by the strategy
strategy = FSDPStrategy(state_dict_type="full")
Enable saving distributed checkpoints
</tr></table>

... (truncated)

Commits

6f6c07d Revert removal of empty-parameters check for configure_optimizers() with FS...
20ce3ae docs: setting cron for periodical update tutorials (#18783)
9f17324 Update probot-check-group.yml to v5.4 (#18782)
c5e3c45 Save ModelCheckpoint's last.ckpt as symlink if possible (#18748)
7434c47 Raise an exception when calling fit twice with spawn (#18776)
5a83f54 Minor strategy fixes [TPU] (#18774)
4df6e13 Update version and changelog (#18767)
83abe5e Bugfix: Pin lightning-cloud version (#18778)
27ad9e9 xfail collective tests (#18779)
c39f680 Fix deletion of resumed checkpoints (#18750)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

matthewcarbone / Crescendo

Bump lightning from 2.0.2 to 2.1.0 #72

Lightning 2.1: Train Bigger, Better, Faster

Highlights

Improvements To Large-Scale Training With FSDP

Efficient Saving and Loading of Large Checkpoints

Default used by the strategy

Enable saving distributed checkpoints