rossumai / keras-multi-gpu

Multi-GPU data-parallel training in Keras
MIT License
77 stars 20 forks source link

Can't convert Operation 'StagingArea_put' to Tensor #9

Open bzamecnik opened 5 years ago

bzamecnik commented 5 years ago

When running using tensoflow-1.12.0 and Keras-2.2.4:

CUDA_VISIBLE_DEVICES=3 python keras_staging_area_cifar10.py

I get the following error:

training pipelined model:
Traceback (most recent call last):
  File "keras_staging_area_cifar10.py", line 73, in <module>
    callbacks=[staging_area_callback, gauge])
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/keras/engine/training.py", line 1010, in fit
    self._make_train_function()
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/keras/engine/training.py", line 519, in _make_train_function
    **self._function_kwargs)
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2744, in function
    return Function(inputs, outputs, updates=updates, **kwargs)
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2567, in __init__
    self.fetches = [tf.identity(x) for x in self.fetches]
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 81, in identity
    return gen_array_ops.identity(input, name=name)
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3454, in identity
    "Identity", input=input, name=name)
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 513, in _apply_op_helper
    raise err
TypeError: Can't convert Operation 'StagingArea_put' to Tensor (target dtype=None, name=u'input', as_ref=False)

Similar: https://stackoverflow.com/questions/47750300/tensorflow-cant-convert-operation-to-tensor

The cause looks like the StagingArea.put operation is wrapped via tf.identity():

 # (since the outputs of fetches are never returned).
   2566         # This requires us to wrap fetches in `identity` ops.
-> 2567         self.fetches = [tf.identity(x) for x in self.fetches]