Explore Improvements to DT Training Procedure

jbloomAus commented 1 year ago

/Just wanted to have meta card to track progress on these things with links:

[x] LayerNorm (I'll probably only try layernorm pre) (https://github.com/jbloomAus/DecisionTransformerInterpretability/issues/52)
[x] AdamW Optimizer
[x] Adding a warmup stage with LambdaLR scheduler or cosine annealing
[x] Implement gated MLP's (https://arxiv.org/pdf/2002.05202.pdf). Might need to be done in TransformerLens.
[x] Make it possible to use GeLU not ReLU (try that out as well).
[x] Better encode state. https://github.com/jbloomAus/DecisionTransformerInterpretability/issues/61
[x] Look into current init ranges for all the model components and consider proper init ranges
[x] Look into where all the parameters are and consider how we can make a sparser model
[x] Implement wandb sweeps for DT training (likely already exists a card for this so I should find it)
[ ] Implement masking rather than just having different tokens during padding. Might be important?

If we've implemented all of those and still no success with the memory env training, possibly try either much longer training runs, more variable sampling methods, or ask for advice (or go bug hunting).

jbloomAus commented 1 year ago

LN and Adam done. No clear benefit on smaller model. I think I'll get everything implemented then set off some sweeps tomorrow/the next day with the memory env environment.

jbloomAus commented 1 year ago

done lr scheduling stuff: https://github.com/users/jbloomAus/projects/1/views/1?pane=issue&itemId=27012682

jbloomAus commented 1 year ago

I'm going to add a task here for setting up wandb sweeps. I think given the stuff I've added, it's important to just get a better sense of the right hyperparameters I need.

jbloomAus commented 1 year ago

I just had a lightbulb moment relating to #61 so I'm going to do that really quick before I attempt wandb sweeps.

jbloomAus commented 1 year ago

converting "Implement masking rather than just having different tokens during padding" to it's own card.

jbloomAus commented 1 year ago

Closing this. Got working agents!

jbloomAus / DecisionTransformerInterpretability

Explore Improvements to DT Training Procedure #53