Resetting different components of PILCO: Models, controller and reward function

nrontsis / PILCO

Bayesian Reinforcement Learning in Tensorflow

MIT License

313 stars 84 forks source link

Resetting different components of PILCO: Models, controller and reward function #13

Closed kyr-pol closed 5 years ago

kyr-pol commented 5 years ago

In many cases the user might want to re-initialise some component, while keeping the rest as they are, for example:

Restarting the model or the controller optimisation, to avoid getting trapped in local minima. We might want to restart one of the two (and keep the other intact) or restart both and make one or multiple restarts and keep the most promising version etc.
By changing the reward function (while keeping the same model) and optimising the controller we can use previous episodes to solve tasks with new goals (possible approach for gym's Reacher-v2 environment, or for transfer learning demos).

Since this might interfere with the tensorflow graph, (see https://github.com/GPflow/GPflow/issues/756 and https://github.com/GPflow/GPflow/issues/719) we might want to provide a method that takes care of it cleanly.

kyr-pol commented 5 years ago

Working on this I decided to print some extra information about the 'Tensorflow' graph in https://github.com/nrontsis/PILCO/commit/68821119c554ed228bec3ff39d7d0167f294c2f0, and it seems like the number of operations in the graph increases for every iteration (the number of variables stays the same). I assumed this was because the number of data points increases, but after reducing the number of data points collected per run (a smaller horizon, T=10 instead of 40), the number of operations in the graph keeps increasing with the same pace (about 4000 per optimisation run). If this is unnecessary we are wasting a lot of time. Do you think it's normal @nrontsis ?

nrontsis commented 5 years ago

Very interesting, thanks for digging it up.

It might be related to this.

I will try to investigate today (unless the usual :P )

nrontsis commented 5 years ago

@kyr-pol which TF version are you using? I am getting deprecation warnings on 6882111:

WARNING:tensorflow:From inverted_pendulum.py:142: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.

nrontsis commented 5 years ago

It seems that the problem is with gpflow's minimizer, as described here, and I think this would fix it.

kyr-pol commented 5 years ago

I am running 1.6.0, I'm getting the warning too, just didn't notice it in the output. I'll replace the deprecated function. Good, that looks like the issue we are having, let's see what work around is favoured.

nrontsis commented 5 years ago

@kyr-pol I think that I fixed it in a88e6e1 but Travis is complaining for some reason.