karpathy / llm.c

LLM training in simple, raw C/CUDA
MIT License
21.28k stars 2.31k forks source link

Add learning rate schedulers #605

Closed gordicaleksa closed 1 week ago

gordicaleksa commented 1 week ago

Refactored the learning rate schedulers code - we'll keep all definitions inside "schedulers.h" as per our offline discussion.

Supported LR schedulers:

karpathy commented 1 week ago

So I think we'll want to identify the scheduler using a string e.g. "cosine" from argparse, and inside the schedulers file should strcmp and build the right one.

karpathy commented 1 week ago

Note there's also a previous PR on learning rate schedulers, including one that is trapezoidal.

gordicaleksa commented 1 week ago

Argh apologies i saw your todo list and just picked it up assuming no one has done it, plus it's such an orthogonal contribution i thought it would have been merged by now - will check the PRs before submitting the next time :)

gordicaleksa commented 1 week ago

https://github.com/karpathy/llm.c/pull/508 <- is this the one?

eliebak commented 1 week ago

Yes i think it was mine, but your PR is much nicer nw aha. It can be nice to identify the schedule from argparse, and if we use trapezoidal, you also have to define the number of decay steps that you want. I can close mine and we can add this features to this one?

gordicaleksa commented 1 week ago

Cool!

I can quickly implement the identification via a string. And happy to add trapezoidal as well. Let's see what Andrej thinks about the next steps.

gordicaleksa commented 1 week ago

Pushed a very ugly switching between the schedulers based on the input argument.

In order to make this closer to what we're used to in Python/C++ we basically have to implement a mini virtual function pointer logic in order to allow for object oriented features.

Happy to do it, and won't be hard, just wanted to double check the decision makes sense because we'll be dealing with func pointers and i don't think we've done previously in the codebase.

cc: @karpathy

gordicaleksa commented 1 week ago

It would look something along the lines of:

typedef struct LRScheduler {
    void (*initialize)(struct LRScheduler* self, double initial_rate);
    void (*step)(struct LRScheduler* self);
    double (*get_rate)(struct LRScheduler* self);
    double current_rate;
} LRScheduler;

too much pointers? :')

karpathy commented 1 week ago

forked this work into new PR https://github.com/karpathy/llm.c/pull/627 made simplifications added wsd closing this one