usc-isi-i2 / dsbox-ta2

The DSBox TA2 component
MIT License
11 stars 6 forks source link

Image primitives fail when loaded more than once #178

Closed kyao closed 6 years ago

kyao commented 6 years ago

See these two pages for possible solutions:

https://stackoverflow.com/questions/46911596/why-does-tensorflow-say-a-tensor-is-not-an-element-of-this-graph-when-training-a

https://github.com/tensorflow/tensorflow/issues/14356

2018-08-09 13:41:10,524 [ERROR] grpc._server -- Exception iterating responses: Cannot interpret feed_dict key as Tensor: Tensor Tensor("Placeholder:0", shape=(7, 7, 3, 64), dtype=float32) is not an element of this graph.
Traceback (most recent call last):
  File "/nfs1/dsbox-repo/kyao/miniconda3/envs/dsbox-devel-710/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1075, in _run
    subfeed, allow_tensor=True, allow_operation=False)
  File "/nfs1/dsbox-repo/kyao/miniconda3/envs/dsbox-devel-710/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3590, in as_graph_element
    return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
  File "/nfs1/dsbox-repo/kyao/miniconda3/envs/dsbox-devel-710/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3669, in _as_graph_element_locked
    raise ValueError("Tensor %s is not an element of this graph." % obj)
ValueError: Tensor Tensor("Placeholder:0", shape=(7, 7, 3, 64), dtype=float32) is not an element of this graph.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nfs1/dsbox-repo/kyao/miniconda3/envs/dsbox-devel-710/lib/python3.6/site-packages/grpc/_server.py", line 405, in _take_response_from_response_iterator
    return next(response_iterator), True
  File "/nfs1/dsbox-repo/kyao/dsbox-710/dsbox-ta2/python/dsbox/server/ta2_servicer.py", line 946, in GetFitSolutionResults
    old_fitted_pipeline, _ = FittedPipeline.load(self.output_dir, fitted_pipeline_id, log_dir=self.log_dir)
  File "/nfs1/dsbox-repo/kyao/dsbox-710/dsbox-ta2/python/dsbox/pipeline/fitted_pipeline.py", line 223, in load
    each_step = pickle.load(f)
  File "/nfs1/dsbox-repo/kyao/dsbox-710/d3m/d3m/primitive_interfaces/base.py", line 726, in __setstate__
    self.__init__(**state['constructor'])  # type: ignore
  File "/nfs1/dsbox-repo/kyao/dsbox-710/dsbox-featurizer/dsbox/datapreprocessing/featurizer/image/net_image_feature.py", line 107, in __init__
    self._RESNET50_MODEL = resnet50.ResNet50(weights='imagenet')
  File "/nfs1/dsbox-repo/kyao/miniconda3/envs/dsbox-devel-710/lib/python3.6/site-packages/keras/applications/resnet50.py", line 271, in ResNet50
    model.load_weights(weights_path)
  File "/nfs1/dsbox-repo/kyao/miniconda3/envs/dsbox-devel-710/lib/python3.6/site-packages/keras/engine/topology.py", line 2667, in load_weights
    f, self.layers, reshape=reshape)
  File "/nfs1/dsbox-repo/kyao/miniconda3/envs/dsbox-devel-710/lib/python3.6/site-packages/keras/engine/topology.py", line 3393, in load_weights_from_hdf5_group
    K.batch_set_value(weight_value_tuples)
  File "/nfs1/dsbox-repo/kyao/miniconda3/envs/dsbox-devel-710/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2377, in batch_set_value
    get_session().run(assign_ops, feed_dict=feed_dict)
  File "/nfs1/dsbox-repo/kyao/miniconda3/envs/dsbox-devel-710/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/nfs1/dsbox-repo/kyao/miniconda3/envs/dsbox-devel-710/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1078, in _run
    'Cannot interpret feed_dict key as Tensor: ' + e.args[0])
TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("Placeholder:0", shape=(7, 7, 3, 64), dtype=float32) is not an element of this graph.
proska commented 6 years ago

The Tensorflow library is not thread-safe or fork-safe which is problematic in our system. A workaround for that is to load it only in the workers or modify the primitives to use the fork-safe version.

kyao commented 6 years ago

This problem arose in the context of TA2-TA3 interaction, where I disabled our multi-processing due grpc limitations. As a quick fix to our system, I modified our TA2 to call clear_session(), and that seems to have worked. But, the real fix base based on https://github.com/tensorflow/tensorflow/issues/14356 , is to keep store and use the tensor flow graph.

    model = ResNet50(weights="imagenet")
    # this is key : save the graph after loading the model
    graph = tf.get_default_graph()
    with graph.as_default():
        preds = model.predict(image)
    #... etc
kyao commented 6 years ago

Fixed with https://github.com/usc-isi-i2/dsbox-featurizer/commit/3d862033f09c12ff31942a1ba804cf3efa8b4532 and https://github.com/usc-isi-i2/dsbox-featurizer/commit/d2972b29170513aa2bac9029ae1f520ebfbfbd8d