Closed jbloomAus closed 1 year ago
LN and Adam done. No clear benefit on smaller model. I think I'll get everything implemented then set off some sweeps tomorrow/the next day with the memory env environment.
done lr scheduling stuff: https://github.com/users/jbloomAus/projects/1/views/1?pane=issue&itemId=27012682
I'm going to add a task here for setting up wandb sweeps. I think given the stuff I've added, it's important to just get a better sense of the right hyperparameters I need.
I just had a lightbulb moment relating to #61 so I'm going to do that really quick before I attempt wandb sweeps.
converting "Implement masking rather than just having different tokens during padding" to it's own card.
Closing this. Got working agents!
/Just wanted to have meta card to track progress on these things with links:
If we've implemented all of those and still no success with the memory env training, possibly try either much longer training runs, more variable sampling methods, or ask for advice (or go bug hunting).