houlei0324 / EasyDML

An Easy programming tool for Distributed Machine Learning
2 stars 1 forks source link

Fault tolerant #18

Open houlei0324 opened 6 years ago

houlei0324 commented 6 years ago

To achieve fault tolerant using checkpoint, maybe based on MPI group.