pkel / cpr

consensus protocol research
9 stars 2 forks source link

Reinforcement Learning #40

Closed pkel closed 1 year ago

pkel commented 1 year ago

Working on reinforcement learning since a few weeks now. Results are satisfactory now.

All training runs are tracked on WandB: https://wandb.ai/tailstorm/cpr-v0.7-ppo?workspace=user-pkel

I'll do a rebase now. Commit hashes shown on WandB might refer to branch archive/2023-01-26_training

pkel commented 1 year ago

A core insight here was the following.

If a model show signs of forgetting during training (mean rollout reward trending downwards or eval reward going down) then the rollout buffer is too small. This can be fixed by increasing n_steps_multiple and total_timesteps by the same factor. I usually did x2 until it worked.

pkel commented 1 year ago

We did much more training and evaluation for the tailstorm paper draft. Activity halted since we're working on the write-up. Merge this now to avoid forgetting it later.