kokkos / kokkos-resilience

Resilience Extensions for Kokkos
Other
4 stars 2 forks source link

Support VELOC_Init_single #45

Open Matthew-Whitlock opened 2 years ago

Matthew-Whitlock commented 2 years ago

Enables non-collective checkpoint recovery (though still requires collective agreement on available checkpoints). Further supports a mutable communicator on which to perform checkpoint/recovery, enabling online process recovery.

Pending merge of 40-update-code-to-work-with-newer-versions-of-kokkos-develop