uber-research / ape-x

This repo replicates the results Horgan et al obtained in "Distributed Prioritized Experience Replay"
Apache License 2.0
190 stars 22 forks source link

example script apex.py crashes #6

Open eleninisioti opened 2 years ago

eleninisioti commented 2 years ago

Hi! Managed to install this using python 3.6 and tensorflow 1.13.1 (make crushed with tensorflow-gpu so I deactivated the gpu).

Running "python apex.py --env video_pinball --num-timesteps 1000000000 --logdir=/tmp/agent" crashes after a few seconds with:

INFO:tensorflow:Saving checkpoints for 768 into /tmp/agent/model.ckpt. Failed to unreference resources Failed to unreference resources Failed to unreference resources Failed to unreference resources Traceback (most recent call last): File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_5_training_queue/prefetch_queue/fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[{{node fifo_queue_Dequeue_1}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "apex.py", line 379, in learn lambda step_context: step_context.session.run([update_target, stage_op])) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 737, in run_step_fn return self._sess.run_step_fn(step_fn, self._tf_sess(), run_with_hooks=None) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1191, in run_step_fn return self._sess.run_step_fn(step_fn, raw_session, run_with_hooks) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1098, in run_step_fn return step_fn(_MonitoredSession.StepContext(raw_session, run_with_hooks)) File "apex.py", line 379, in lambda step_context: step_context.session.run([update_target, stage_op])) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_5_training_queue/prefetch_queue/fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[node fifo_queue_Dequeue_1 (defined at apex.py:336) ]]

Caused by op 'fifo_queue_Dequeue_1', defined at: File "apex.py", line 437, in cli() File "apex.py", line 433, in cli main(env=args.env, num_timesteps=args.num_timesteps) File "apex.py", line 48, in main *kwargs File "apex.py", line 336, in learn train_dequeue = training_fifo.dequeue() File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 445, in dequeue self._queue_ref, self._dtypes, name=name) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 3908, in queue_dequeue_v2 timeout_ms=timeout_ms, name=name) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, **kwargs) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in init self._traceback = tf_stack.extract_stack()

OutOfRangeError (see above for traceback): FIFOQueue '_5_training_queue/prefetch_queue/fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[node fifo_queue_Dequeue_1 (defined at apex.py:336) ]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "apex.py", line 437, in cli() File "apex.py", line 433, in cli main(env=args.env, num_timesteps=args.num_timesteps) File "apex.py", line 48, in main *kwargs File "apex.py", line 386, in learn sess.run_step_fn(lambda step_context: step_context.session.run([update_target])) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 788, in exit self._close_internal(exception_type) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 826, in _close_internal self._sess.close() File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1082, in close self._sess.close() File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1244, in close ignore_live_threads=True) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(self._exc_info_to_raise) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/six.py", line 719, in reraise raise value File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 257, in _run enqueue_callable() File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1257, in _single_operation_run self._call_tf_sessionrun(None, {}, [], target_list, None) File "/home/elena/anaconda3/envs/tf_old/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnimplementedError: Generic conv implementation only supports NHWC tensor format for now. [[{{node deepq/read_q_func/convnet/Conv/Conv2D}}]]