Open maxweissenbacher opened 3 months ago
created a new branch to track this
Let's use PPO for on-policy and SAC for off-policy. Can change this later if desired.
I've added a working PPO code for KS now. Need to check hyperparams and add logging and evaluation. Also: need to inject the trained model from @eliseoe into the RL agent
Preliminary analysis of runs: I finished a few PPO runs (no auto encoder) and it turns out that performance is very very good. I suspect this is because we use multiple parallel training environments (previously I only used one).
One finding: using dt=0.05 works significantly worse than dt=0.005. With the latter choice, we can get fast convergence even for nu=0.01!
This issue is used to track progress in setting up the RL code initially
TO-DO list: