Closed gordicaleksa closed 1 week ago
So I think we'll want to identify the scheduler using a string e.g. "cosine" from argparse, and inside the schedulers file should strcmp
and build the right one.
Note there's also a previous PR on learning rate schedulers, including one that is trapezoidal.
Argh apologies i saw your todo list and just picked it up assuming no one has done it, plus it's such an orthogonal contribution i thought it would have been merged by now - will check the PRs before submitting the next time :)
https://github.com/karpathy/llm.c/pull/508 <- is this the one?
Yes i think it was mine, but your PR is much nicer nw aha. It can be nice to identify the schedule from argparse, and if we use trapezoidal, you also have to define the number of decay steps that you want. I can close mine and we can add this features to this one?
Cool!
I can quickly implement the identification via a string. And happy to add trapezoidal as well. Let's see what Andrej thinks about the next steps.
Pushed a very ugly switching between the schedulers based on the input argument.
In order to make this closer to what we're used to in Python/C++ we basically have to implement a mini virtual function pointer logic in order to allow for object oriented features.
Happy to do it, and won't be hard, just wanted to double check the decision makes sense because we'll be dealing with func pointers and i don't think we've done previously in the codebase.
cc: @karpathy
It would look something along the lines of:
typedef struct LRScheduler {
void (*initialize)(struct LRScheduler* self, double initial_rate);
void (*step)(struct LRScheduler* self);
double (*get_rate)(struct LRScheduler* self);
double current_rate;
} LRScheduler;
too much pointers? :')
forked this work into new PR https://github.com/karpathy/llm.c/pull/627 made simplifications added wsd closing this one
Refactored the learning rate schedulers code - we'll keep all definitions inside "schedulers.h" as per our offline discussion.
Supported LR schedulers: