yrlu / reinforcement_learning

Implementation of selected reinforcement learning algorithms in Tensorflow. A3C, DDPG, REINFORCE, DQN, etc.
MIT License
151 stars 48 forks source link

Errors in running with tf 1.0.0 (DDPG) #1

Closed Amir-Ramezani closed 7 years ago

Amir-Ramezani commented 7 years ago

Hi,

I am trying to run your code with tf 1.0.0, beside some changes like mul to multiply I have the following error:

ddpg$ python pendulum_ddpg.py --device=cpu --episodes=300

Namespace(device='cpu', episodes=300, log_dir='/tmp/pendulum-log-0') [2017-04-04 20:02:25,839] Making new env: Pendulum-v0 Traceback (most recent call last): File "pendulum_ddpg.py", line 95, in critic = CriticNetwork(state_size=STATE_SIZE, action_size=ACTION_SIZE, lr=CRITIC_LEARNING_RATE, tau=TAU) File "/home/amir-ai/DDPG-Codes/stormmax-TF-VERSION-MISMACH/reinforcement_learning-master/ddpg/critic.py", line 27, in init self.input_s, self.action, self.critic_variables, self.q_value = self._build_network("critic") File "/home/amir-ai/DDPG-Codes/stormmax-TF-VERSION-MISMACH/reinforcement_learning-master/ddpg/critic.py", line 44, in _build_network layer_2 = tf_utils.fc(tf.concat(1, (layer_1, action)), self.n_h2, scope="fc2", activation_fn=tf.nn.relu, File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1029, in concat dtype=dtypes.int32).get_shape( File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 637, in convert_to_tensor as_ref=False) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 702, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 110, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 99, in constant tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape)) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_util.py", line 367, in make_tensor_proto _AssertCompatible(values, dtype) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible (dtype.name, repr(mismatch), type(mismatch).name)) TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

Could you tell me how to fix? and what is your tf version and if you have upgraded to tf 1.0.0?

Thanks,

Amir-Ramezani commented 7 years ago

I fixed that one (concat function) Now I have the following error:

python pendulum_ddpg.py --device=cpu --episodes=300 Namespace(device='cpu', episodes=300, log_dir='/tmp/pendulum-log-0') [2017-04-04 23:28:15,843] Making new env: Pendulum-v0 W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. WARNING:tensorflow:From pendulum_ddpg.py:102: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Use tf.global_variables_initializer instead. [2017-04-04 23:28:16,400] From pendulum_ddpg.py:102: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Use tf.global_variables_initializer instead. ====evaluation==== Traceback (most recent call last): File "pendulum_ddpg.py", line 104, in train(agent, env, sess) File "pendulum_ddpg.py", line 69, in train env.render() File "/usr/local/lib/python2.7/dist-packages/gym/core.py", line 174, in render return self._render(mode=mode, close=close) File "/usr/local/lib/python2.7/dist-packages/gym/envs/classic_control/pendulum.py", line 66, in _render from gym.envs.classic_control import rendering File "/usr/local/lib/python2.7/dist-packages/gym/envs/classic_control/rendering.py", line 23, in from pyglet.gl import File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/init.py", line 236, in import pyglet.window File "/usr/local/lib/python2.7/dist-packages/pyglet/window/init.py", line 1817, in gl._create_shadow_window() File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/init.py", line 205, in _create_shadow_window _shadow_window = Window(width=1, height=1, visible=False) File "/usr/local/lib/python2.7/dist-packages/pyglet/window/xlib/init.py", line 163, in init super(XlibWindow, self).init(args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/pyglet/window/init.py", line 505, in init config = screen.get_best_config(template_config) File "/usr/local/lib/python2.7/dist-packages/pyglet/canvas/base.py", line 161, in get_best_config configs = self.get_matching_configs(template) File "/usr/local/lib/python2.7/dist-packages/pyglet/canvas/xlib.py", line 179, in get_matching_configs configs = template.match(canvas) File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/xlib.py", line 29, in match have_13 = info.have_version(1, 3) File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/glx_info.py", line 89, in have_version client = [int(i) for i in client_version.split('.')] ValueError: invalid literal for int() with base 10: 'None'

yrlu commented 7 years ago

Hi,

Thanks for the feedback. I just tried the lasted version of the code on my machine, and it runs properly. So, I think it is probably due to version mismatches. My Tensorflow version: '0.12.1', gym version: 0.8.0. Hope that helps!

Regards,

Amir-Ramezani commented 7 years ago

Thanks, I downgraded tensorflow to 0.12.1, but the previous problem still was there (ValueError: invalid literal for int() with base 10: 'None'). After some searching in the net, there was a solution about changing the place of gym.render(), to put it before tensorflow initialization. So I followed that soluytion by adding

env = gym.make('Pendulum-v0')

env.reset() env.render()

actor = ActorNetwork(state_size=STATE_SIZE, action_size=ACTION_SIZE, lr=ACTOR_LEARNING_RATE, tau=TAU)

now it shows the gym window for a second and I have the following new error:

ddpg$ python pendulum_ddpg.py --device=gpu --episodes=300 I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally Namespace(device='gpu', episodes=300, log_dir='/tmp/pendulum-log-0') [2017-04-05 18:06:47,174] Making new env: Pendulum-v0 I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate (GHz) 1.898 pciBusID 0000:01:00.0 Total memory: 7.92GiB Free memory: 60.06MiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0) E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 60.06M (62980096 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 54.06M (56682240 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 48.65M (51014144 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY WARNING:tensorflow:From pendulum_ddpg.py:106 in .: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Use tf.global_variables_initializer instead. [2017-04-05 18:06:48,658] From pendulum_ddpg.py:106 in .: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Use tf.global_variables_initializer instead. ====evaluation==== E tensorflow/stream_executor/cuda/cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED W tensorflow/stream_executor/stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support Traceback (most recent call last): File "pendulum_ddpg.py", line 108, in train(agent, env, sess) File "pendulum_ddpg.py", line 72, in train action = agent.get_action(cur_state, sess)[0] File "/home/amir-ai/DDPG-Codes/stormmax-TF-VERSION-MISMACH/reinforcement_learning-master/ddpg/ddpg.py", line 35, in get_action action = self.actor.get_action(state, sess) * self.action_bound File "/home/amir-ai/DDPG-Codes/stormmax-TF-VERSION-MISMACH/reinforcement_learning-master/ddpg/actor.py", line 52, in get_action return sess.run(self.action_values, feed_dict={self.input_s: state}) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(1, 3), b.shape=(3, 400), m=1, n=400, k=3 [[Node: actor/fc1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_1, actor/fc1/W/read)]]

Caused by op u'actor/fc1/MatMul', defined at: File "pendulum_ddpg.py", line 98, in actor = ActorNetwork(state_size=STATE_SIZE, action_size=ACTION_SIZE, lr=ACTOR_LEARNING_RATE, tau=TAU) File "/home/amir-ai/DDPG-Codes/stormmax-TF-VERSION-MISMACH/reinforcement_learning-master/ddpg/actor.py", line 28, in init self.input_s, self.actor_variables, self.action_values = self._build_network("actor") File "/home/amir-ai/DDPG-Codes/stormmax-TF-VERSION-MISMACH/reinforcement_learning-master/ddpg/actor.py", line 42, in _build_network initializer=tf.contrib.layers.variance_scaling_initializer(mode="FAN_IN")) File "/home/amir-ai/DDPG-Codes/stormmax-TF-VERSION-MISMACH/reinforcement_learning-master/ddpg/tf_utils.py", line 49, in fc fc1 = tf.add(tf.matmul(x, W), b) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1729, in matmul a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1442, in _mat_mul transpose_b=transpose_b, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in init self._traceback = _extract_stack()

InternalError (see above for traceback): Blas SGEMM launch failed : a.shape=(1, 3), b.shape=(3, 400), m=1, n=400, k=3 [[Node: actor/fc1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_1, actor/fc1/W/read)]]

yrlu commented 7 years ago

The code should be fast enough to run on CPU. Could you try:

$ python pendulum_ddpg.py --device=cpu --episodes=300

Amir-Ramezani commented 7 years ago

I tried with cpu parameter but it freezes after the first episode (the pendulum freezes and does not move at all)

ddpg$ python pendulum_ddpg.py --device=cpu --episodes=300 I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally Namespace(device='cpu', episodes=300, log_dir='/tmp/pendulum-log-0') [2017-04-06 12:11:22,435] Making new env: Pendulum-v0 I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate (GHz) 1.898 pciBusID 0000:01:00.0 Total memory: 7.92GiB Free memory: 6.64GiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0) WARNING:tensorflow:From pendulum_ddpg.py:106 in .: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Use tf.global_variables_initializer instead. [2017-04-06 12:11:23,874] From pendulum_ddpg.py:106 in .: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Use tf.global_variables_initializer instead. ====evaluation==== Episode 0 finished after 10000 timesteps, cum_reward: -55652.0191741 [-0.22457573] Episode 1 finished after 10000 timesteps, cum_reward: -75135.8944072 [-2.]

Amir-Ramezani commented 7 years ago

Okay, this time I tested with tf 0.12.1 and this is what happens that after that pendulum freezes: This happens after first episode finishes

action: [-0.38487354] action: [-0.58512461] action: [-0.81554598] action: [-1.0566802] action: [-1.27760756] action: [-1.46067202] action: [-1.59311712] action: [-1.67671788] action: [-1.72090757] action: [-1.73762679] action: [-1.73265874] action: [-1.71370041] action: [-1.68674815] action: [-1.65958309] action: [-1.64037359] action: [-1.63696158] action: [-1.65651488] action: [-1.70190346] action: [-1.76903963] action: [-1.84198368] action: [-1.90575802] action: [-1.95135677] action: [-1.97490346] action: [-1.97858274] action: [-1.98142064] action: [-1.98368716] action: [-1.98556185] action: [-1.98717892] action: [-1.98716283] action: [-1.98500359] action: [-1.98137581] action: [-1.9767555] action: [-1.97258258] action: [-1.97108066] action: [-1.97373033] action: [-1.98009074] action: [-1.98741102] action: [-1.9933573] action: [-1.99607182] action: [-1.99608123] action: [-1.99596763] action: [-1.99573469] action: [-1.99543393] action: [-1.99506128] action: [-1.99464703] action: [-1.99291754] action: [-1.98933053] action: [-1.9835906] action: [-1.97565043] action: [-1.96708775] action: [-1.96103573] action: [-1.96053863] action: [-1.96625841] action: [-1.97501588] action: [-1.98391533] action: [-1.98527539] action: [-1.98313618] action: [-1.98025417] action: [-1.97664547] action: [-1.97202492] action: [-1.96572077] action: [-1.95563447] action: [-1.93450117] action: [-1.89873874] action: [-1.84621942] action: [-1.77909195] action: [-1.70402038] action: [-1.6371218] action: [-1.5946933] action: [-1.5829283] action: [-1.59469318] action: [-1.61972153] action: [-1.61004293] action: [-1.54189277] action: [-1.45627677] action: [-1.34555471] action: [-1.20569563] action: [-1.00605762] action: [-0.72520745] action: [-0.3912178] action: [-0.07667994] action: [ 0.18435697] action: [ 0.37708578] action: [ 0.51098329] action: [ 0.60057694] action: [ 0.65180397] action: [ 0.68357408] action: [ 0.70951432] action: [ 0.73351908] action: [ 0.75949579] action: [ 0.79375297] action: [ 0.83251274] action: [ 0.87939984] action: [ 0.92684138] action: [ 0.97185111] action: [ 1.00252771] action: [ 1.02369118] action: [ 1.00792813] action: [ 0.95578331] action: [ 0.88404506] action: [ 0.82539588] action: [ 0.78419346] action: [ 0.78334504] action: [ 0.81200349] action: [ 0.86788136] action: [ 0.92905474] action: [ 0.97647309] action: [ 0.99297351] action: [ 0.96735859] action: [ 0.93084317] action: [ 0.8875975] action: [ 0.81204742] action: [ 0.79363471] action: [ 1.02052164] action: [ 1.35935426] action: [ 1.61784327] action: [ 1.79163611] action: [ 1.89111602] action: [ 1.94083798] action: [ 1.96616757] action: [ 1.97830093] action: [ 1.98398387] action: [ 1.98623013] action: [ 1.98631179] action: [ 1.98480928] action: [ 1.98189127] action: [ 1.97826815] action: [ 1.97501874] action: [ 1.97361243] action: [ 1.97572136] action: [ 1.97990203] action: [ 1.98507261] action: [ 1.99011946] action: [ 1.99396443] action: [ 1.99455035] action: [ 1.99296224] action: [ 1.99068594] action: [ 1.98807061] action: [ 1.98595035] action: [ 1.98530626] action: [ 1.98528695] action: [ 1.98346519] action: [ 1.98060811] action: [ 1.97691739] action: [ 1.97351575] action: [ 1.9722482] action: [ 1.97382843] action: [ 1.97611487] action: [ 1.97956145] action: [ 1.98336875] action: [ 1.986444] action: [ 1.9806993] action: [ 1.97140777] action: [ 1.95837748] action: [ 1.94567227] action: [ 1.93882847] action: [ 1.93858933] action: [ 1.93789995] action: [ 1.93274581] action: [ 1.92830551] action: [ 1.92740154] action: [ 1.9273324] action: [ 1.93111682] action: [ 1.93600786] action: [ 1.93999004] action: [ 1.94363165] action: [ 1.94912708] action: [ 1.95295501] action: [ 1.93292546] action: [ 1.90727019] action: [ 1.88904953] action: [ 1.89442289] action: [ 1.91993606] action: [ 1.9498384] action: [ 1.96711516] action: [ 1.97814608] action: [ 1.98521984] action: [ 1.98939979] action: [ 1.99217319] action: [ 1.9946909] action: [ 1.99650586] action: [ 1.99780178] action: [ 1.99874425] action: [ 1.99934912] action: [ 1.9995091] action: [ 1.99936557] action: [ 1.99916399] action: [ 1.99899232] action: [ 1.99903393] action: [ 1.99927819] action: [ 1.99958563] action: [ 1.99972272] action: [ 1.9997716] action: [ 1.99978769] action: [ 1.99979842] action: [ 1.99982417] action: [ 1.99987042] action: [ 1.99991739] action: [ 1.99995863] action: [ 1.99998248] action: [ 1.9999938] action: [ 1.99999297] action: [ 1.99998903] action: [ 1.99998438] action: [ 1.99998116] action: [ 1.99998295] action: [ 1.99998915] action: [ 1.99999344] action: [ 1.99999428] action: [ 1.99999368] action: [ 1.99999273] action: [ 1.99999118] action: [ 1.99999094] action: [ 1.99999356] action: [ 1.9999963] action: [ 1.99999821] action: [ 1.99999964] action: [ 2.] action: [ 1.99999976] action: [ 1.99999952] action: [ 1.99999928] action: [ 1.99999905] action: [ 1.99999928] action: [ 1.99999952] action: [ 1.99999976] action: [ 1.99999964] action: [ 1.99999928] action: [ 1.99999905] action: [ 1.99999905] action: [ 1.99999869] action: [ 1.99999905] action: [ 1.9999994] action: [ 1.99999964] action: [ 2.] action: [ 2.] action: [ 2.] action: [ 2.00000024] action: [ 1.99999964] action: [ 1.99999988] action: [ 1.99999988] action: [ 2.] action: [ 2.] action: [ 1.99999964] action: [ 1.99999988] action: [ 1.99999976] action: [ 1.99999976] action: [ 1.99999952] action: [ 1.99999976] action: [ 1.99999964] action: [ 2.] action: [ 2.] action: [ 2.] action: [ 2.] action: [ 2.] action: [ 2.] action: [ 2.] action: [ 2.] action: [ 2.] action: [ 2.] action: [ 2.] action: [ 2.] action: [ 2.] action: [ 1.99999964] action: [ 2.] action: [ 2.]

yrlu commented 7 years ago

It seems that you are using a different version of gym as mine. But I made an update to the code that might solve this problem. Please check the last pendulum_ddpg.py.

Amir-Ramezani commented 7 years ago

Okay, your change in the code made it better and helped me to understand what was my issue with the code. The following link is a video captured from the screen after changing max_step to 200, as you can see it freezed (it still processing though): https://drive.google.com/file/d/0Bwsg0i6pm8bSOXIxWHZlVTNidHM/view

and this one is after I added the render after the else and it solved my misunderstanding: https://drive.google.com/file/d/0Bwsg0i6pm8bSbTlsUVpkVU5nMzg/view

for t in xrange(MAX_STEPS):
  if (i % EVALUATE_EVERY) == 0:
    env.render()
    action = agent.get_action(cur_state, sess)[0]
  else:
    **env.render()**

Both of them work correctly.

Thanks for your code and help.