issues
search
houminz
/
paper-reading
Paper Reading:涉及分布式、虚拟化、网络、机器学习
https://houmin.cc/papers
22
stars
0
forks
source link
Gemini: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints
#24
Open
houminz
opened
4 months ago
houminz
commented
4 months ago
Paper:
https://zhuangwang93.github.io/docs/Gemini_SOSP23.pdf
Paper: https://zhuangwang93.github.io/docs/Gemini_SOSP23.pdf