nrontsis / PILCO

Bayesian Reinforcement Learning in Tensorflow
MIT License
311 stars 84 forks source link

pendulum_swing_up.py example doesn't solve the task #26

Closed JoeMWatson closed 4 years ago

JoeMWatson commented 5 years ago

Rollout_7

Attached is a plot of the time domain performance of the controller after the 8 episodes. As you can see, it is not close to stabilizing about 0 or +/-2pi.

kyr-pol commented 5 years ago

Hey @JoeMWatson, Thanks for pointing this out, I'll look into this, I had previously run about 20 random seeds it seemed quite stable. There is always the chance that learning will diverge, but in these examples and with the chosen parameters, performance has been consistently good. I'll rerun some of these, I'll try another system too.

Could you share some details about your system, tf and gpflow versions etc?

Some things to check if you want: Can you replicate the problem ? Does it persist across random seeds (the random seed shouldn't be an issue but...) ? Did PILCO get a reasonable solution on any of the previous episodes and diverged eventually or did it never get off?

JoeMWatson commented 5 years ago

Upon further investigation, it seems the random seed affects how many iterations are required for convergence. A random seed on 0 (currently on Master) needed 12 iterations but the script currently performs 8 on Master, meanwhile a random seed of 1 only required 3!

Maybe this sensitivity should be documented and a better performing seed chosen for the example script.

kyr-pol commented 4 years ago

After recent updates and testing, performance is quite consistent across 10 random seeds so I am closing this.