google / paxml

Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry leading model flop utilization rates.
Apache License 2.0
458 stars 69 forks source link

How to continue training from a checkpoint? #37

Open lkm2835 opened 1 year ago

lkm2835 commented 1 year ago

I am trying to continue training my model from a checkpoint, using paxml.

Does paxml not support restore_checkpoint_dir or restore_checkpoint_step for train mode?