keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.69k stars 19.43k forks source link

The weights file will not be closed after called load_weights function.  #5430

Closed gyan-garcia closed 7 years ago

gyan-garcia commented 7 years ago

I got Keras 1.2.2 and tensorflow 1.0.0 running on a Django server on a Windows 10 machine. 

My current architecture loads the weights (load_weights) each time there is a relevant rest API call. This works as expected when making the first call just after starting the server. Any subsequent call will always fail when calling load_weights. I presume the .h5 file stays opened after calling load_weights. This is the code that gets executed on each Rest API call:

# load json and create model
json_file = open('C:\TensorFlow\\ImageRecognizerServer\\ImageRecognizerServer\\app\\model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)

# load weights into new model
loaded_model.load_weights("C:\TensorFlow\\ImageRecognizerServer\\ImageRecognizerServer\\app\\model_weights.h5")
print("Loaded model from disk")

loaded_model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print("Compiling model")

return predict_class(loaded_model, img_matrix, 1)

This is the callstack I get:

Traceback (most recent call last): File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\tensorflow\python\client\session.py", line 267, in init fetch, allow_tensor=True, allow_operation=True)) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\tensorflow\python\framework\ops.py", line 2318, in as_graph_element return self._as_graph_element_locked(obj, allow_tensor, allow_operation) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\tensorflow\python\framework\ops.py", line 2402, in _as_graph_element_locked raise ValueError("Operation %s is not an element of this graph." % obj) ValueError: Operation name: "init" op: "NoOp" input: "^convolution2d_1_W/Assign" input: "^convolution2d_1_b/Assign" input: "^convolution2d_2_W/Assign" input: "^convolution2d_2_b/Assign" input: "^dense_1_W/Assign" input: "^dense_1_b/Assign" input: "^dense_2_W/Assign" input: "^dense_2_b/Assign" is not an element of this graph.

Thanks for this beautiful component!

patyork commented 7 years ago

So I couldn't recreate this on linux.

Could you try to just load the weights multiple times in a row? If it is in fact a file being left open (which I'm not sure it is) it should fail on the second load.

e.g.:

loaded_model.load_weights("C:\TensorFlow\\ImageRecognizerServer\\ImageRecognizerServer\\app\\model_weights.h5")
print("Loaded model from disk")
loaded_model.load_weights("C:\TensorFlow\\ImageRecognizerServer\\ImageRecognizerServer\\app\\model_weights.h5") #should fail here, theoretically
print("Loaded model from disk2")

If that loads, perhaps try to do the entire process twice, and see where it fails. So copy-paste your code, changing the first return to tmp =

gyan-garcia commented 7 years ago

Hi thanks for looking at this.  I did load the weights multiple times in a row and it works correctly. Also, calling my method repeated times from inside a python prompt also works. It is when I call it from the Rest API when it fails after the first time. And yes, it fails when I load_weights is called.

I am sharing a more complete call stack that I hope provides more information:

Internal Server Error: /rest_api/smart_canvas/0/get_image_class/ Traceback (most recent call last): File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\tensorflow\python\client\session.py", line 267, in init fetch, allow_tensor=True, allow_operation=True)) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\tensorflow\python\framework\ops.py", line 2318, in as_graph_element return self._as_graph_element_locked(obj, allow_tensor, allow_operation) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\tensorflow\python\framework\ops.py", line 2402, in _as_graph_element_locked raise ValueError("Operation %s is not an element of this graph." % obj) ValueError: Operation name: "init" op: "NoOp" input: "^convolution2d_1_W/Assign" input: "^convolution2d_1_b/Assign" input: "^convolution2d_2_W/Assign" input: "^convolution2d_2_b/Assign" input: "^dense_1_W/Assign" input: "^dense_1_b/Assign" input: "^dense_2_W/Assign" input: "^dense_2_b/Assign" is not an element of this graph.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\django\core\handlers\exception.py", line 39, in inner response = get_response(request) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\django\core\handlers\base.py", line 249, in _legacy_get_response response = self._get_response(request) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\django\core\handlers\base.py", line 187, in _get_response response = self.process_exception_by_middleware(e, request) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\django\core\handlers\base.py", line 185, in _get_response response = wrapped_callback(request, *callback_args, callback_kwargs) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\django\views\decorators\csrf.py", line 58, in wrapped_view return view_func(*args, *kwargs) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\rest_framework\viewsets.py", line 83, in view return self.dispatch(request, args, kwargs) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\rest_framework\views.py", line 483, in dispatch response = self.handle_exception(exc) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\rest_framework\views.py", line 443, in handle_exception self.raise_uncaught_exception(exc) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\rest_framework\views.py", line 480, in dispatch response = handler(request, *args, **kwargs) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\app\viewsets.py", line 40, in get_image_class prediction = predict_image_class(x_coordinates, y_coordinates, json_dict['width'], json_dict['height']) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\app\predict_image.py", line 54, in predict_image_class loaded_model.load_weights("C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\app\model_weights.h5") File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\keras\engine\topology.py", line 2708, in load_weights self.load_weights_from_hdf5_group(f) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\keras\engine\topology.py", line 2794, in load_weights_from_hdf5_group K.batch_set_value(weight_value_tuples) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\keras\backend\tensorflow_backend.py", line 1881, in batch_set_value get_session().run(assign_ops, feed_dict=feed_dict) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\keras\backend\tensorflow_backend.py", line 125, in get_session _initialize_variables() File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\keras\backend\tensorflow_backend.py", line 282, in _initialize_variables sess.run(tf.variables_initializer(uninitialized_variables)) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\tensorflow\python\client\session.py", line 766, in run run_metadata_ptr) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\tensorflow\python\client\session.py", line 951, in _run fetch_handler = _FetchHandler(self._graph, fetches, feed_dict_string) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\tensorflow\python\client\session.py", line 407, in init self._fetch_mapper = _FetchMapper.for_fetch(fetches) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\tensorflow\python\client\session.py", line 238, in for_fetch return _ElementFetchMapper(fetches, contraction_fn) File "C:\TensorFlow\ImageRecognizerServer\ImageRecognizerServer\Django_TS_Test\lib\site-packages\tensorflow\python\client\session.py", line 274, in init 'Tensor. (%s)' % (fetch, str(e))) ValueError: Fetch argument <tensorflow.python.framework.ops.Operation object at 0x0000024A51ABFE80> cannot be interpreted as a Tensor. (Operation name: "init" op: "NoOp" input: "^convolution2d_1_W/Assign" input: "^convolution2d_1_b/Assign" input: "^convolution2d_2_W/Assign" input: "^convolution2d_2_b/Assign" input: "^dense_1_W/Assign" input: "^dense_1_b/Assign" input: "^dense_2_W/Assign" input: "^dense_2_b/Assign" is not an element of this graph.) [17/Feb/2017 13:24:47] "POST /rest_api/smart_canvas/0/get_image_class/ HTTP/1.1" 500 24315

patyork commented 7 years ago

It looks like it's external to Keras, then, completely. This would be an issue to raise on StackOverflow, or perhaps the TF board, with a replicable example.

One thing to check is whether or not the model is being deleted/disposed of (after it is used, and before the next one is used). It looks almost like the model is being deleted, but then it is being asked to load weights back in. I'd imagine its something to do with threading/asynchronosity in the web server.

gyan-garcia commented 7 years ago

Thanks patyork, I will close the issue then =)

patyork commented 7 years ago

You might also consider just loading the model once, on server start, instead of reloading it on every API call, unless there's a specific reason you can't or don't want to.

-----Original Message----- From: "Gyan Garcia Avila" notifications@github.com Sent: ‎2/‎17/‎2017 1:45 PM To: "fchollet/keras" keras@noreply.github.com Cc: "Pat York" pat.york@nevada.unr.edu; "Comment" comment@noreply.github.com Subject: Re: [fchollet/keras] The weights file will not be closed after called load_weights function.  (#5430)

Thanks patyork, I will close the issue then =) — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

gyan-garcia commented 7 years ago

That would be ideal. I am unfamiliar with how to cache objects on Django so I am doing a little research into that just know =)

gyan-garcia commented 7 years ago

I got around this issue by calling clear_session from keras.backend.tensorflow_backend after each request.

It seems like a but on tensorflow.

wugoukanle commented 7 years ago

I meet the same error too, but I use vgg16 in a pyqt thread, the problem was that thread can only run one time, the second run will cast a error: process finished with exit code 1, i use the method to track error in [http://stackoverflow.com/questions/34363552/python-process-finished-with-exit-code-1-when-using-pycharm-and-pyqt5] after that exact information as follow: <class 'ValueError'> Fetch argument <tf.Operation 'init' type=NoOp> cannot be interpreted as a Tensor. (Operation name: "init" op: "NoOp" input: "^conv1_1_W/Assign" input: "^conv1_1_b/Assign" input: "^conv1_2_W/Assign" input: "^conv1_2_b/Assign" input: "^conv2_1_W/Assign" input: "^conv2_1_b/Assign" input: "^conv2_2_W/Assign" input: "^conv2_2_b/Assign" input: "^conv3_1_W/Assign" input: "^conv3_1_b/Assign" input: "^conv3_2_W/Assign" input: "^conv3_2_b/Assign" input: "^conv3_3_W/Assign" input: "^conv3_3_b/Assign" input: "^conv4_1_W/Assign" input: "^conv4_1_b/Assign" input: "^conv4_2_W/Assign" input: "^conv4_2_b/Assign" input: "^conv4_3_W/Assign" input: "^conv4_3_b/Assign" input: "^conv5_1_W/Assign" input: "^conv5_1_b/Assign" input: "^conv5_2_W/Assign" input: "^conv5_2_b/Assign" input: "^conv5_3_W/Assign" input: "^conv5_3_b/Assign" input: "^dense_3_W/Assign" input: "^dense_3_b/Assign" input: "^dense_4_W/Assign" input: "^dense_4_b/Assign" is not an element of this graph.) <traceback object at 0x00000197A939A108>

and finally i use keras.backend.tensorflow_backend.clearsession() in the end of vgg16 code, it works, ^^ thanks the gyan-garcia

klxts commented 7 years ago

I solved my problem in this method which puzzled me for a long time, thanks very very much!!