maxpumperla / deep_learning_and_the_game_of_go

Code and other material for the book "Deep Learning and the Game of Go"
https://www.manning.com/books/deep-learning-and-the-game-of-go
953 stars 387 forks source link

Master Branch: ValueError: Tensor Tensor("dense_2_1/Softmax:0", shape=(?, 361), dtype=float32) is not an element of this graph. 127.0.0.1 - - [06/Apr/2019 16:22:53] "POST /select-move/predict HTTP/1.1" 500 #25

Closed bjordan2010 closed 5 years ago

bjordan2010 commented 5 years ago

When using the (Chapter 8 or the master) code and running end_to_end.py or web_demo.py --predict-agent path-to-deep_bot.h5, when I play my first move at http://127.0.0.1:5000/static/play_predict_19.html, I get the stack trace below. Could it be my use of TensorFlow 1.13.1 and Keras 2.2.4 (the latest for Python3 on a Mac) are not compatible with your code?

EDIT: It seems to failing on this line in ops.py of the Tensorflow library for the betago.hdf5 and for deep_bot.h5 :

  c = self.as_graph_element(c)   <----
  if isinstance(c, Tensor):
    c = c.op
  elif not isinstance(c, Operation):
    raise TypeError("Control input must be Operation or Tensor: %s" % c)
  if c not in current:
    control_ops.append(c)
    current.add(c)
return self._ControlDependenciesController(self, control_ops)

While the next line would be True. Perhaps this is a change from 1.8 to 1.13 with checking the graph first. I found this stating that 1.12 exports can't be used with 1.13 tensorflow so I am likely having this problem using betago.hdf5 bot. I will comment out the line and try the deep_bot.

EDIT: Once I commented out the graph check which may cause more issues but seems to get me past the first one, I get another error. That error seems to be related to https://stackoverflow.com/questions/46725323/keras-tensorflow-exception-while-predicting-from-multiple-threads. Could you post how and where we would warm up the model for the deep bot using the web demo? I assume it would be something like:

self.model.predict(np.array([[0, 0]])) # warmup

but I tried that in predict.py before the real predict call and it fails with an error. I'll keep debugging but any help would be appreciated. None of your code will run in the web app until this is resolved.

Stack Trace: [2019-04-06 16:22:53,754] ERROR in app: Exception on /select-move/predict [POST] Traceback (most recent call last): File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/flask/app.py", line 2292, in wsgi_app response = self.full_dispatch_request() File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/flask/app.py", line 1815, in full_dispatch_request rv = self.handle_user_exception(e) File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/flask/app.py", line 1718, in handle_user_exception reraise(exc_type, exc_value, tb) File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/flask/_compat.py", line 35, in reraise raise value File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/flask/app.py", line 1813, in full_dispatch_request rv = self.dispatch_request() File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/flask/app.py", line 1799, in dispatch_request return self.view_functionsrule.endpoint File "/Users/driftwood/Documents/devel/python/dlgo/code/dlgo/httpfrontend/server.py", line 55, in select_move bot_move = bot_agent.select_move(game_state) File "/Users/driftwood/Documents/devel/python/dlgo/code/dlgo/agent/predict.py", line 32, in select_move move_probs = self.predict(game_state) File "/Users/driftwood/Documents/devel/python/dlgo/code/dlgo/agent/predict.py", line 28, in predict return self.model.predict(input_tensor)[0] File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/keras/engine/training.py", line 1164, in predict self._make_predict_function() File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/keras/engine/training.py", line 554, in _make_predict_function kwargs) File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2744, in function return Function(inputs, outputs, updates=updates, kwargs) File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2546, in init with tf.control_dependencies(self.outputs): File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 5028, in control_dependencies return get_default_graph().control_dependencies(control_inputs) File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 4528, in control_dependencies c = self.as_graph_element(c) File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3478, in as_graph_element return self._as_graph_element_locked(obj, allow_tensor, allow_operation) File "/Users/driftwood/.virtualenvs/myvenv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3557, in _as_graph_element_locked raise ValueError("Tensor %s is not an element of this graph." % obj) ValueError: Tensor Tensor("dense_2_1/Softmax:0", shape=(?, 361), dtype=float32) is not an element of this graph. 127.0.0.1 - - [06/Apr/2019 16:22:53] "POST /select-move/predict HTTP/1.1" 500 -

maxpumperla commented 5 years ago

@bjordan2010 sorry for the inconvenience, have you tried downgrading TF for testing purposes? It's impossible for us to foresee changes in upstream libraries. I've never seen this error before, did you @macfergus ?

bjordan2010 commented 5 years ago

@maxpumperla I tried downgrading both keras and tf but

1) tf 1.8 is no longer available in pip so 1.10.0 was as low as I could go. 2) I had generated the bot with the newer versions so I ran into other issues since the bot was generated with 1.13 and I was running with 1.10. Therefore I thought to try to get it running with the newer versions would help new people coming to the book now.

To recreate the issue all one should have to do is: 1) Upgrade to keras 2.2.4 and tf 1.13.0. 2) Clone/Pull the master code. 3) Run the end_to_end.py script.

If someone does that and does not see the error I am getting then I'll assume it is something that I have done on my end but I do not think that is the case.

macfergus commented 5 years ago

Hi @bjordan2010, thank you for bringing this to our attention. Can you pull this diff: https://github.com/maxpumperla/deep_learning_and_the_game_of_go/commit/0797b3eef3c74f6be0a0ed39c719767978ef9f55 and see if that fixes it?

After much debugging, I think the problem is not in the new Tensorflow versions, but rather in the new Flask version! (flask is the webserver we use for the web demo) In the previous version of Flask the web server ran in single-threaded mode by default. The latest Flask switched to multi-threading by default. But that creates the problem with Tensorflow, where now we are initiating the model on the parent thread, but trying to do inference on the child thread.

Please let me know if this fixes it for you, and sorry for the hassle!

bjordan2010 commented 5 years ago

That worked, now I can continue. Thank you very much.