Closed kyr-pol closed 5 years ago
Working on this I decided to print some extra information about the 'Tensorflow' graph in https://github.com/nrontsis/PILCO/commit/68821119c554ed228bec3ff39d7d0167f294c2f0, and it seems like the number of operations in the graph increases for every iteration (the number of variables stays the same). I assumed this was because the number of data points increases, but after reducing the number of data points collected per run (a smaller horizon, T=10 instead of 40), the number of operations in the graph keeps increasing with the same pace (about 4000 per optimisation run). If this is unnecessary we are wasting a lot of time. Do you think it's normal @nrontsis ?
Very interesting, thanks for digging it up.
It might be related to this.
I will try to investigate today (unless the usual :P )
@kyr-pol which TF version are you using? I am getting deprecation warnings on 6882111:
WARNING:tensorflow:From inverted_pendulum.py:142: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
I am running 1.6.0, I'm getting the warning too, just didn't notice it in the output. I'll replace the deprecated function. Good, that looks like the issue we are having, let's see what work around is favoured.
@kyr-pol I think that I fixed it in a88e6e1 but Travis is complaining for some reason.
In many cases the user might want to re-initialise some component, while keeping the rest as they are, for example:
Restarting the model or the controller optimisation, to avoid getting trapped in local minima. We might want to restart one of the two (and keep the other intact) or restart both and make one or multiple restarts and keep the most promising version etc.
By changing the reward function (while keeping the same model) and optimising the controller we can use previous episodes to solve tasks with new goals (possible approach for gym's Reacher-v2 environment, or for transfer learning demos).
Since this might interfere with the tensorflow graph, (see https://github.com/GPflow/GPflow/issues/756 and https://github.com/GPflow/GPflow/issues/719) we might want to provide a method that takes care of it cleanly.