matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.51k stars 11.68k forks source link

TypeError: unhashable type: 'ListWrapper' TensorFlow 2.1.0rc0 during training #1889

Open kiflowb777 opened 4 years ago

kiflowb777 commented 4 years ago

Python 3.6 TensorFlow: 2.1.0rc0 Keras: 2.2.4-tf

After start training:

 File "C:\project\maskRCNN\model.py", line 349, in compile
    self.keras_model.add_loss(loss)
  File "C:\python36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 1081, in add_loss
    self._graph_network_add_loss(symbolic_loss)
  File "C:\python36\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 1484, in _graph_network_add_loss
    self._insert_layers(new_layers, new_nodes)
  File "C:\python36\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 1439, in _insert_layers
    layer_set = set(self._layers)
  File "C:\python36\lib\site-packages\tensorflow_core\python\training\tracking\data_structures.py", line 598, in __hash__
    raise TypeError("unhashable type: 'ListWrapper'")
TypeError: unhashable type: 'ListWrapper'
kiflowb777 commented 4 years ago

Related topics: https://github.com/tensorflow/tensorflow/issues/34962 https://github.com/tensorflow/tensorflow/issues/33471 https://github.com/tensorflow/tensorflow/issues/32127

dankor commented 4 years ago

Any estimates on this issue?

taylormcclenny commented 4 years ago

How are you running this with TF 2.0? Are there updates or documentation on conversion? Am I missing something??

Sorry for such an open question...

kiflowb777 commented 4 years ago

@taylormcclenny Yes I try run my maskRCNN code with tf.keras on TF 1.14, 1,15, 2.0 and 2.1rc0 Here more info about this issue: https://github.com/tensorflow/tensorflow/issues/34962

The "ListWrapper" bug appear after fixing output layer shape: https://github.com/tensorflow/tensorflow/issues/33785

taylormcclenny commented 4 years ago

@kiflowb777 & @dankor - My understanding is that Mask-RCNN won't run on TF 2.0. See the comments on this article, since TF 2.0's release.

I've been attempting to convert this model to run on TF 2.0 but I just get endless errors. Again, I apologize for a question that is so much more broad than your original post, but I can't find the info elsewhere - Is there somewhere else I can look for finding an updated Mask-RCNN that works (kind of) on TF 2.0?

dankor commented 4 years ago

It seems to require also heavy-lifting rework rather than one-convert-script-run renaming methods. Currently, as I see, @tomgross is working on the migration since he has marked this bug here.

tomgross commented 4 years ago

I found the cause and the solution. This is the responsible tensorflow / keras commit: https://github.com/tensorflow/tensorflow/commit/45df90d5c2d6b125a10cb0809899c254d49412e6#diff-8eb7e20502209f082d0cb15119a50413R781

As documented you need to wrap the loss function with an empty lamda, when adding to the model. I've added the fix to my tensorflow 2.0 compatibility PR here: https://github.com/matterport/Mask_RCNN/pull/1896/files#diff-312c7e001d14bbb7ce5f8978f7b04cc3R2171

mmalahe commented 4 years ago

I think the offending lines might be where these protected variables of keras_model are accessed directly:

self.keras_model._losses = []
self.keras_model._per_input_losses = {}

Removing those allowed me to proceed with training without setting those empty lambdas.

travishsu commented 4 years ago

Removing the brackets works well to me,

modify from loss = (tf.reduce_mean(input_tensor=layer.output, keepdims=True)) to loss = tf.reduce_mean(input_tensor=layer.output, keepdims=True)

mayurmahurkar commented 3 years ago

I think the offending lines might be where these protected variables of keras_model are accessed directly:

self.keras_model._losses = []
self.keras_model._per_input_losses = {}

Removing those allowed me to proceed with training without setting those empty lambdas.

When i removed these lines, I got the following error:


File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 479, in _disallow_in_graph_mode
    " this function with @tf.function.".format(task))
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function. 
kiflowb777 commented 3 years ago

Related topics: https://github.com/tensorflow/tensorflow/issues/47309 https://github.com/tensorflow/tensorflow/issues/39702#issuecomment-631750377

kiflowb777 commented 3 years ago

@mayurmahurkar Add tf.compat.v1.disable_eager_execution() after import tensorflow as tf

lovehell commented 3 years ago

There is an issue however when you remove these lines:

self.keras_model._losses = []
self.keras_model._per_input_losses = {}

They were used to prevent duplicated losses. If you remove these lines and do a multi-step training, the losses for the previous step won't be cleared and you'll end up with x2 losses.

This is OK as long as you do not need to change the learning rate. Any hint to clear losses for multi-step training ?

Thanks

Behnam72 commented 2 years ago

@lovehell I have this issue. Did you solve it?

lovehell commented 2 years ago

@Behnam72 I didn't. However as far as I remember it does not corrupt your training, it only displays wrong losses

Behnam72 commented 2 years ago

@lovehell thanks for the answer. I'd appreciate it if you can answer this:

These are my losses for two epochs (each ran separately with model.train):

epoch 1/1 100/100 [==============================] - 69s 626ms/step - batch: 49.5000 - size: 8.0000 - loss: 1.1028 - rpn_class_loss: 0.0173 - rpn_bbox_loss: 0.3368 - mrcnn_class_loss: 0.2695 - mrcnn_bbox_loss: 0.2328 - mrcnn_mask_loss: 0.2465 - val_loss: 1.7118 - val_rpn_class_loss: 0.0155 - val_rpn_bbox_loss: 0.6753 - val_mrcnn_class_loss: 0.3638 - val_mrcnn_bbox_loss: 0.3188 - val_mrcnn_mask_loss: 0.3385

epoch 2/2 100/100 [==============================] - 34s 230ms/step - batch: 49.5000 - size: 8.0000 - loss: 0.4404 - rpn_class_loss: 0.0062 - rpn_bbox_loss: 0.0626 - mrcnn_class_loss: 0.0484 - mrcnn_bbox_loss: 0.0330 - mrcnn_mask_loss: 0.0699 - val_loss: 3.1889 - val_rpn_class_loss: 0.0167 - val_rpn_bbox_loss: 0.7603 - val_mrcnn_class_loss: 0.2439 - val_mrcnn_bbox_loss: 0.2668 - val_mrcnn_mask_loss: 0.3067

For the second epoch, the sum of 5 losses in both training and validation is 1/2 of the "loss" and "val_loss". Is this only because i did not empty losses? if so, then why are the 5 losses okay? because we had these two lines in TF1.x:

    self.keras_model._losses = []
    self.keras_model._per_input_losses = {}

They empty both losses and per input losses. Does this mean the per input losses are also double now?

SindhuKodali commented 2 years ago

https://github.com/matterport/Mask_RCNN/pull/1896/files#diff-312c7e001d14bbb7ce5f8978f7b04cc3R2171

Hey,

I am facing the same error ,could you tell me how did you solve it?

trueToastedCode commented 2 years ago

I think I found a solution or workaround... change

# First, clear previously set losses to avoid duplication
self.keras_model._losses = []
self.keras_model._per_input_losses = {}

to

# First, clear previously set losses to avoid duplication
try:
    self.keras_model._losses.clear()
except AttributeError:
    pass
try:
    self.keras_model._per_input_losses.clear()
except AttributeError:
    pass

and also change a few lines afterwards from:

for name in loss_names:
    layer = self.keras_model.get_layer(name)
    if layer.output in self.keras_model.losses:
        continue
    loss = (
        tf.reduce_mean(layer.output, keepdims=True)
        * self.config.LOSS_WEIGHTS.get(name, 1.))
    self.keras_model.add_loss(loss)

to

existing_layer_names = []
for name in loss_names:
    layer = self.keras_model.get_layer(name)
    if layer is None or name in existing_layer_names:
        continue
    existing_layer_names.append(name)
    loss = (tf.reduce_mean(layer.output, keepdims=True)
            * self.config.LOSS_WEIGHTS.get(name, 1.))
    self.keras_model.add_loss(loss)

as well as self.keras_model.metrics_tensors.append(loss) to self.keras_model.add_metric(loss, name=name, aggregation='mean')

sampath9875 commented 1 year ago

as well as self.keras_model.metrics_tensors.append(loss) to self.keras_model.add_metric(loss, name=name, aggregation='mean')

After making this change, I get an error of "unhashable type : ListWrapper". Not sure how to proceed after this.

WesYarber commented 1 year ago

@sampath9875 Did you end up finding a working solution?

sampath9875 commented 1 year ago

@sampath9875 Did you end up finding a working solution?

No. I decided to try Detectron's version of Mask RCNN.

trueToastedCode commented 1 year ago

@sampath9875 Did you end up finding a working solution?

No. I decided to try Detectron's version of Mask RCNN.

Does this work on Apple Silicon?

sampath9875 commented 1 year ago

I believe it should provided all the required packages are installed. Detectron2 is built on PyTorch. And it also requires a whole list of additional packages.

gbinduo commented 10 months ago

I think the offending lines might be where these protected variables of keras_model are accessed directly:

self.keras_model._losses = []
self.keras_model._per_input_losses = {}

Removing those allowed me to proceed with training without setting those empty lambdas.

It works perfectly. Thanks a lot!