The graph assembles just fine, after finalization and onto training, one of the threads (assuming main?) gets hung. This is the traceback:
-------------------- Thread 4590249408 --------------------
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 257, in _run
enqueue_callable()
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1257, in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
also get this traceback for the same thread (getting hung here as well?):
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 293, in _close_on_stop
coord.wait_for_stop()
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 311, in wait_for_stop
return self._stop_event.wait(timeout)
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 551, in wait
signaled = self._cond.wait(timeout)
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 295, in wait
waiter.acquire()
Thread traceback acquired using https://gist.github.com/niccokunzmann/6038331. I'm assuming this is because each instance of the btc_env creates a new psycopg2 connection to the history database hosted by postgresql, however I do not know how to fix it.
Although I know this is an outside project, any insight would be greatly appreciated. If the deepMind IMPALA implementation demonstrates decent results, I would be happy to share the implementation here if I can get it working.
Apologies, the same thread also has this traceback as well (i.e. is getting stuck for more than 10 seconds):
-------------------- Thread 4590249408 --------------------
File "run.py", line 656, in
tf.app.run()
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "run.py", line 650, in main
train(action_set, level_names)
File "run.py", line 571, in train
lambda step_context: step_context.session.run(stage_op))
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 732, in run_step_fn
return self._sess.run_step_fn(step_fn, self._tf_sess(), run_with_hooks=None)
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1176, in run_step_fn
return self._sess.run_step_fn(step_fn, raw_session, run_with_hooks)
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1083, in run_step_fn
return step_fn(_MonitoredSession.StepContext(raw_session, run_with_hooks))
File "run.py", line 571, in
lambda step_context: step_context.session.run(stage_op))
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
The graph assembles just fine, after finalization and onto training, one of the threads (assuming main?) gets hung. This is the traceback:
-------------------- Thread 4590249408 -------------------- File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 884, in _bootstrap self._bootstrap_inner() File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 257, in _run enqueue_callable() File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1257, in _single_operation_run self._call_tf_sessionrun(None, {}, [], target_list, None) File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata)
also get this traceback for the same thread (getting hung here as well?):
File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 884, in _bootstrap self._bootstrap_inner() File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 293, in _close_on_stop coord.wait_for_stop() File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 311, in wait_for_stop return self._stop_event.wait(timeout) File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 551, in wait signaled = self._cond.wait(timeout) File "/Users/hughalessi/miniconda3/envs/rl/lib/python3.6/threading.py", line 295, in wait waiter.acquire()
Thread traceback acquired using https://gist.github.com/niccokunzmann/6038331. I'm assuming this is because each instance of the btc_env creates a new psycopg2 connection to the history database hosted by postgresql, however I do not know how to fix it.
Although I know this is an outside project, any insight would be greatly appreciated. If the deepMind IMPALA implementation demonstrates decent results, I would be happy to share the implementation here if I can get it working.