Open jim79 opened 2 years ago
Hi @jim79, Please refer the below sample checkpoint utility functions. You may need to consider other factors like this LR scheduler which is non-trivial. We will offer this support soon,Thanks.
def load_checkpoint(checkpoint, solver):
r"""Load the last states of the training."""
print(f"Checkpoint Loading from: {str(path)}\n")
with open(checkpoint, 'r') as file:
info = json.load(file)
path = Path(info['params_path'])
nn.load_parameters(str(path / 'model.h5'))
solver.load_states(str(path / 'solver_states.h5'))
return info['cur_epoch']
def save_checkpoint(path, solver, cur_epoch):
r"""Save the current states of the training."""
path = Path(path)
nn.save_parameters(str(path / 'model.h5'))
solver.save_states(str(path / 'solver_states.h5'))
with open(path / 'checkpoint.json', 'w') as f:
json.dump(
dict(cur_epoch=cur_epoch,
params_path=str(path),),
f
)
print(f"Checkpoint saved: {str(path)}\n")
Thank you for the response.
Hi, How do we resume training on x-umx from a checkpoint? --checkpoint argument (as in umx) seems to be unrecognised Thank you