Gradient based policy optimisation.

nrontsis / PILCO

Bayesian Reinforcement Learning in Tensorflow

MIT License

314 stars 84 forks source link

Gradient based policy optimisation. #41

Closed patxikuku closed 4 years ago

patxikuku commented 4 years ago

Hello,

if I understood correctly, the authors of PILCO uses a gradient based method for optimising the policy. In the current implementation it doesn't seem to the case, you use L-BFGS-B without giving the computation of the jacobian.

Did you make any experiments using a gradient based method ?

nrontsis commented 4 years ago

Gradients are computed and used in L-BFGS-B. This is the whole point of using TensorFlow. Perhaps this is not immediately obvious when examining the code, because the gradient computation is handled via GPflow.

fuku10 commented 2 years ago

Hello, Does it mean using numerical calculated gradient, not analytically calculated gradient?

nrontsis commented 2 years ago

It’s neither, it’s via automatic differentiation.

fuku10 commented 2 years ago

I thought the minimize() automatically calculate the gradient using the finite-difference method. (In case of scipy.optimize.minimize; "If None or False, the gradient will be estimated using 2-point finite difference estimation with an absolute step size.") https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html

Anyway, I'll study TensorFlow and GPflow. Thanks!