tensorflow / community

Stores documents used by the TensorFlow developer community
Apache License 2.0
1.26k stars 576 forks source link

RFC: Checkpoint Sharding Callback #458

Closed BlaziusMaximus closed 6 months ago

BlaziusMaximus commented 7 months ago

This RFC will be open for comment until Monday, February 5th, 2024. cc @k-w-w @petrychenko

Checkpoint Sharding Callback

Status Implemented
RFC # 458
Author(s) Adam Cogdell (adamcogdell@google.com)
Sponsor Ivan Petrychenko (petrychenko@google.com)
Updated 2024-01-23

Objective

I am proposing a new callback mechanism that allows the user to have more control over how checkpoints are sharded. The purpose of this RFC is to publicize the design of this new checkpointing feature, which has been implemented (see tensorflow/python/checkpoint/sharding), but is open to comments and changes from the open source community.