lessw2020 / Ranger21

Ranger deep learning optimizer rewrite to use newest components
Apache License 2.0
321 stars 45 forks source link

resuming training with ranger21? #18

Open neuronflow opened 3 years ago

neuronflow commented 3 years ago

As I learned ranger21 does internal lr scheduling etc.

How should training be resumed? Is there a state dict to be loaded etc.?

lessw2020 commented 3 years ago

Hi @neuronflow, Thanks for opening the issue! Ranger21 does maintain a basic state dict but for sure we need to update it with some additional data to ensure a clean restart if training is stopped. Let me use this issue to track it and I'll test and fix it ideally in the next few days as this has been on my todo list.

neuronflow commented 2 years ago

any updates on this one? :) I lost multiple GPU days of training because the trainings are non resumable :/

Elevory commented 2 years ago

Seconding the need for this feature!