Closed DavidVillero closed 4 years ago
I think this is due to the tf.cast() calls, which add nodes to the graph on each iteration. I would try to avoid doing any TF operations inside the callback, or if you need to, do it once and save the result in a global variable for reuse. That way the Tensors don't keep building up.
System information
Describe the problem
What am I trying to do? I'm training DQN/Rainbow agent with noisy nets and I'm noticing that as the model converges towards the ideal policy, exploration does not decline (maybe this is not a problem). Typically, having noisy actions is not a problem as long as the model converges to an optimal policy but I would like to find the source of this noise or understand better why this happens. I would like to create the same plots shown in the Noisy Networks for Exploration paper, where they compare the learning curves of the average noise parameter, sigma (an average of the weight values in each layer of the target network), . How am I doing it? I managed to get the weight values for each layer of my target network and save them in a .csv by using a callback on_train_result. Which looks like this:
Source code / logs
The callback works, but policy.sess.graph.get_operations() gets bigger every episode, ( I don't understand why this happens) causing the iteration
for op in policy.sess.graph.get_operations()
to take longer and longer every episode. does anyone know what I'm doing wrong? and is there a better way of extracting the information I'm after?Thank you