Closed pkel closed 1 year ago
A core insight here was the following.
If a model show signs of forgetting during training (mean rollout reward trending downwards or eval reward going down) then the rollout buffer is too small. This can be fixed by increasing n_steps_multiple
and total_timesteps
by the same factor. I usually did x2 until it worked.
We did much more training and evaluation for the tailstorm paper draft. Activity halted since we're working on the write-up. Merge this now to avoid forgetting it later.
Working on reinforcement learning since a few weeks now. Results are satisfactory now.
All training runs are tracked on WandB: https://wandb.ai/tailstorm/cpr-v0.7-ppo?workspace=user-pkel
I'll do a rebase now. Commit hashes shown on WandB might refer to branch
archive/2023-01-26_training