I would like to resume training from the last checkpoint and last batch ID to handle training interruptions. I see some remainders from possible implementations here, but they're commented out.
Also, #43 mentions about resume_optimizer is implemented, however there is no other reference to the parsed argument.
grep -r "resume_optimizer" .
./colbert/utils/parser.py: # NOTE: Providing a checkpoint is one thing, --resume is another, --resume_optimizer is yet another.
./colbert/utils/parser.py: self.add_argument('--resume_optimizer', dest='resume_optimizer', default=False, action='store_true')
Could you help me how can I implement these resume and resume_optimizer again?
So, I can handle training interruptions in my pipeline, and also contribute back to the repository with examples.
Hello all,
I would like to resume training from the last checkpoint and last batch ID to handle training interruptions. I see some remainders from possible implementations here, but they're commented out.
https://github.com/stanford-futuredata/ColBERT/blob/7be0114f00dc938aca4a3a5929bef5bbb99485e6/colbert/training/training.py#L81-L83
Also, #43 mentions about
resume_optimizer
is implemented, however there is no other reference to the parsed argument.So, it seems like this feature after these implementations. I tried to dig into this, and found that removed on (October 13th, 2021 7:40 PM)
Initial commit with the new API and residual compression
by @okhat Reference: https://github.com/stanford-futuredata/ColBERT/blame/7be0114f00dc938aca4a3a5929bef5bbb99485e6/colbert/training/training.py#L81-L83Could you help me how can I implement these
resume
andresume_optimizer
again? So, I can handle training interruptions in my pipeline, and also contribute back to the repository with examples.