google / orbax

Orbax provides common checkpointing and persistence utilities for JAX users
https://orbax.readthedocs.io/
Apache License 2.0
305 stars 36 forks source link

Fix slice selection logic in emergency checkpointing, so that a slice with a complete local checkpoint (if it exists) is always chosen as the secondary slice. Additionally, ensure that the `ArrayHandler` used to save persistent checkpoints is configured with the correct `primary_host`. #1221

Closed copybara-service[bot] closed 1 month ago

copybara-service[bot] commented 1 month ago

Fix slice selection logic in emergency checkpointing, so that a slice with a complete local checkpoint (if it exists) is always chosen as the secondary slice. Additionally, ensure that the ArrayHandler used to save persistent checkpoints is configured with the correct primary_host.