(FailedPreconditionError: 2 root error(s) found.) While upgrading tensorflow to 2.x

amalmathews commented 4 years ago

I have recently updated my TensorFlow to 2.2 for working with GPU support but after upgrading my maskRCNN program start showing many kinds of errors.

As per my little experience with deep learning, I solved most issues bu is stuck with this one error which I'm getting and is not able to solve it.


Checkpoint Path: ./Cigg_Bud-Gpu/logs/cig_butts20200601T0825/mask_rcnn_cig_butts_{epoch:04d}.h5
Selecting layers to train
fpn_c5p5               (Conv2D)
fpn_c4p4               (Conv2D)
fpn_c3p3               (Conv2D)
fpn_c2p2               (Conv2D)
fpn_p5                 (Conv2D)
fpn_p2                 (Conv2D)
fpn_p3                 (Conv2D)
fpn_p4                 (Conv2D)
In model:  rpn_model
    rpn_conv_shared        (Conv2D)
    rpn_class_raw          (Conv2D)
    rpn_bbox_pred          (Conv2D)
mrcnn_mask_conv1       (TimeDistributed)
mrcnn_mask_bn1         (TimeDistributed)
mrcnn_mask_conv2       (TimeDistributed)
mrcnn_mask_bn2         (TimeDistributed)
mrcnn_class_conv1      (TimeDistributed)
mrcnn_class_bn1        (TimeDistributed)
mrcnn_mask_conv3       (TimeDistributed)
mrcnn_mask_bn3         (TimeDistributed)
mrcnn_class_conv2      (TimeDistributed)
mrcnn_class_bn2        (TimeDistributed)
mrcnn_mask_conv4       (TimeDistributed)
mrcnn_mask_bn4         (TimeDistributed)
mrcnn_bbox_fc          (TimeDistributed)
mrcnn_mask_deconv      (TimeDistributed)
mrcnn_class_logits     (TimeDistributed)
mrcnn_mask             (TimeDistributed)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/indexed_slices.py:434: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
WARNING:tensorflow:Model failed to serialize as JSON. Ignoring... An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
  @tf.function
  def has_init_scope():
    my_constant = tf.constant(1.)
    with tf.init_scope():
      added = my_constant * 2
The graph tensor has name: anchors/Variable:0
/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py:49: UserWarning: Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the `keras.utils.Sequence class.
  UserWarning('Using a generator with `use_multiprocessing=True`'

Epoch 1/4
---------------------------------------------------------------------------
FailedPreconditionError                   Traceback (most recent call last)
<ipython-input-13-05f25cba3043> in <module>()
      4             learning_rate=config.LEARNING_RATE,
      5             epochs=4,
----> 6             layers='heads')

6 frames
/usr/local/lib/python3.6/dist-packages/mrcnn/model.py in train(self, train_dataset, val_dataset, learning_rate, epochs, layers, augmentation)
   2350             max_queue_size=100,
   2351             workers=workers,
-> 2352             use_multiprocessing=True,
   2353         )
   2354         self.epoch = max(self.epoch, epochs)

/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1730             use_multiprocessing=use_multiprocessing,
   1731             shuffle=shuffle,
-> 1732             initial_epoch=initial_epoch)
   1733 
   1734     @interfaces.legacy_generator_methods_support

/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    218                                             sample_weight=sample_weight,
    219                                             class_weight=class_weight,
--> 220                                             reset_metrics=False)
    221 
    222                 outs = to_list(outs)

/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics)
   1512             ins = x + y + sample_weights
   1513         self._make_train_function()
-> 1514         outputs = self.train_function(ins)
   1515 
   1516         if reset_metrics:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/backend.py in __call__(self, inputs)
   3630 
   3631     fetched = self._callable_fn(*array_vals,
-> 3632                                 run_metadata=self.run_metadata)
   3633     self._call_fetch_callbacks(fetched[-len(self._fetches):])
   3634     output_structure = nest.pack_sequence_as(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1470         ret = tf_session.TF_SessionRunCallabl```123```e(self._session._session,
   1471                                                self._handle, args,
-> 1472                                                run_metadata_ptr)
   1473         if run_metadata:
   1474           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

**FailedPreconditionError: 2 root error(s) found.
  (0) Failed precondition: Error while reading resource variable anchors/Variable from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/anchors/Variable/N10tensorflow3VarE does not exist.
     [[{{node ROI/ReadVariableOp}}]]
     [[Mean_5/_1303]]
  (1) Failed precondition: Error while reading resource variable anchors/Variable from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/anchors/Variable/N10tensorflow3VarE does not exist.
     [[{{node ROI/ReadVariableOp}}]]
0 successful operations.
0 derived errors ignored.**

I have been such with the problem and I really need help. Someone please help me with the issue

innat commented 4 years ago

This repo is not updated for tf 2.x. You should downgrade tf 1.13.x

amalmathews commented 4 years ago

But then how to use GPU for computing GPU is only available for TensorFlow 2.x, with CPU computing take far more time

innat commented 4 years ago

You can use GPU for computing as well, there will be no problem if you downgrade tf 1.x. The main thing you need to know and consider now is that there is no official support (from this Github author) for tf 2.x. Though some people are trying to make it work, but there may be still a lack of something (I didn't try on their approach).

And just to inform you, I've recently made a notebook on the on-going kaggle competition about wheat head detection using this implementation. The (by default or updated kernel) used tf 2.1 so I've faced some trouble with that and at the end manage to downgrade tf 1.x. While facing this issue, I've made a request in the discussion forum, link here, maybe you can find some more insight on this.

amalmathews commented 4 years ago

You can use GPU for computing as well, there will be no problem if you downgrade tf 1.x. The main thing you need to know and consider now is that there is no official support (from this Github author) for tf 2.x. Though some people are trying to make it work, but there may be still a lack of something (I didn't try on their approach).

And just to inform you, I've recently made a notebook on the on-going kaggle competition about wheat head detection using this implementation. The (by default or updated kernel) used tf 2.1 so I've faced some trouble with that and at the end manage to downgrade tf 1.x. While facing this issue, I've made a request in the discussion forum, link here, maybe you can find some more insight on this.

I did but now I'm getting a new error OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.

www111

Please help with this recurring error

innat commented 4 years ago

What is your tf version now? Where you're running your code? Can you share the reproducible code of yours?

matterport / Mask_RCNN

(FailedPreconditionError: 2 root error(s) found.) While upgrading tensorflow to 2.x #2219