Closed b-chu closed 1 week ago
@b-chu about the new API, couple questions:
train_loader:
<some params>
callbacks:
curriculum_learning:
duration: 5000000tok
schedule:
- duration: 5000000tok
train_loader:
<some params>
- duration: 5000000tok
train_loader:
<some params>
train_loader
as a top-level entry?duration
specified is for the top-level train_loader
?Also, I'm worried about the loss curves in the plots you shared, they don't look fully deterministic to me. What model size and batch size were you running at, and with which datasets? Longer training runs with a bigger model and small batch size, without shuffling, would be helpful so that we can determine if the loss curves are actually deterministic or not. Just looking at the first few steps most training runs will look pretty similar regardless of the data ordering.
Yes, this needs a composer release. I'll rerun cicd after that release and before merging. Yes, train_loader is specified still and curriculum_learning.duration is its duration. We discussed offline with data team and they'll try the callback later when doing a longer training run. I think there's slight discrepancies in rng when running on interactive, but comparing to a run with no CL callback, the new callback matches the loss exactly while the old callback is slightly different. Also when comparing two different datasets/splits the loss is much greater than the plots above.
Curriculum learning callback
Requirements
Features
Other
Manual tests
Matches old callback behavior
Resumes correctly in the middle of the schedule
Resumes correctly when new datamix added to schedule
Resumes correctly when callback added after initial training run
API
Old API:
Start a new run
Start a new run
New API: