yao62995 / A3C

Advantage async actor-critic Algorithms (A3C) and Progressive Neural Network implemented by tensorflow.
121 stars 42 forks source link

var.ref() is deprecated in recent TF version #2

Open kkjh0723 opened 7 years ago

kkjh0723 commented 7 years ago

I tried to run the A3C_atari.py in TF version 0.11, it shows an error and I found that tf.variable.ref() function is deprecated. I'm not sure how to change it. When I just replace v.ref() to v, it seems variables are not updated. the reward are all fixed to 0.0

yao62995 commented 7 years ago

@kkjh0723 I found that TF r0.11 replace tf.variable.ref() to tf.variable._ref(). You just need to replace v.ref() as v._ref().

kkjh0723 commented 7 years ago

Thanks for the answer. However, another error comes out after I changed. Below is the message.

File "A3C_atari_mod.py", line 531, in main
model = A3CAtari()
File "A3C_atari_mod.py", line 467, in __init__
job = A3CSingleThread(thread_id, self)
File "A3C_atari_mod.py", line 282, in __init__
self.do_accum_grads_ops = self.do_accumulate_gradients()
File "A3C_atari_mod.py", line 329, in do_accumulate_gradients
accum_ops = tf.assign_add(accum_grad, grad, name=name)
File "/home/jinhyung/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 75, in assign_add
use_locking=use_locking, name=name)
File "/home/jinhyung/anaconda/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 613, in apply_op
(input_name, op_type_name))
TypeError: Input 'ref' of 'AssignAdd' Op requires l-value input

It seems accum_grad is not proper for the tf.assign_add. Do you have any idea?

yao62995 commented 7 years ago

@kkjh0723 I have upgraded tensorflow to r0.11, but don't encounter this problem. check API def assign_add(ref, value, use_locking=None, name=None), maybe you can try this tf.assign_add(accum_grad._ref(), grad, name=name)

kkjh0723 commented 7 years ago

I run the program as you suggested(changing v.ref() to ._ref()) and I could run it. But problem is, the reward at the very beginning is changing but after some period it become fixed to 0. Below is the log I got from the program.

2016-12-07 10:57:26 [INFO] game=Breakout-v0, train_step=14342, episode=10, reward(avg:1.90, mid:2.00, std:1.04), time=23(s)
2016-12-07 10:59:21 [INFO] game=Breakout-v0, train_step=63947, episode=10, reward(avg:0.20, mid:0.00, std:0.60), time=85(s)
2016-12-07 11:01:14 [INFO] game=Breakout-v0, train_step=113556, episode=10, reward(avg:0.00, mid:0.00, std:0.00), time=83(s)
2016-12-07 11:03:08 [INFO] game=Breakout-v0, train_step=163278, episode=10, reward(avg:0.00, mid:0.00, std:0.00), time=84(s)
2016-12-07 11:05:02 [INFO] game=Breakout-v0, train_step=212951, episode=10, reward(avg:0.00, mid:0.00, std:0.00), time=84(s)
2016-12-07 11:06:56 [INFO] game=Breakout-v0, train_step=262711, episode=10, reward(avg:0.00, mid:0.00, std:0.00), time=84(s)
2016-12-07 11:08:50 [INFO] game=Breakout-v0, train_step=312327, episode=10, reward(avg:0.00, mid:0.00, std:0.00), time=84(s)
2016-12-07 11:10:43 [INFO] game=Breakout-v0, train_step=362089, episode=10, reward(avg:0.00, mid:0.00, std:0.00), time=84(s)

the time become increased and rewards are going to 0 after the third print. It does not change afterward. Don't you have this problem? I didn't change the hyperparameter you set.

yao62995 commented 7 years ago

@kkjh0723 I'm not sure this problem. I'll check the problem soon.

kkjh0723 commented 7 years ago

@yao62995 Thanks very much!!