Open kiflowb777 opened 4 years ago
Any estimates on this issue?
How are you running this with TF 2.0? Are there updates or documentation on conversion? Am I missing something??
Sorry for such an open question...
@taylormcclenny Yes I try run my maskRCNN code with tf.keras on TF 1.14, 1,15, 2.0 and 2.1rc0 Here more info about this issue: https://github.com/tensorflow/tensorflow/issues/34962
The "ListWrapper" bug appear after fixing output layer shape: https://github.com/tensorflow/tensorflow/issues/33785
@kiflowb777 & @dankor - My understanding is that Mask-RCNN won't run on TF 2.0. See the comments on this article, since TF 2.0's release.
I've been attempting to convert this model to run on TF 2.0 but I just get endless errors. Again, I apologize for a question that is so much more broad than your original post, but I can't find the info elsewhere - Is there somewhere else I can look for finding an updated Mask-RCNN that works (kind of) on TF 2.0?
It seems to require also heavy-lifting rework rather than one-convert-script-run renaming methods. Currently, as I see, @tomgross is working on the migration since he has marked this bug here.
I found the cause and the solution. This is the responsible tensorflow / keras commit: https://github.com/tensorflow/tensorflow/commit/45df90d5c2d6b125a10cb0809899c254d49412e6#diff-8eb7e20502209f082d0cb15119a50413R781
As documented you need to wrap the loss function with an empty lamda, when adding to the model. I've added the fix to my tensorflow 2.0 compatibility PR here: https://github.com/matterport/Mask_RCNN/pull/1896/files#diff-312c7e001d14bbb7ce5f8978f7b04cc3R2171
I think the offending lines might be where these protected variables of keras_model
are accessed directly:
self.keras_model._losses = []
self.keras_model._per_input_losses = {}
Removing those allowed me to proceed with training without setting those empty lambdas.
Removing the brackets works well to me,
modify from
loss = (tf.reduce_mean(input_tensor=layer.output, keepdims=True))
to
loss = tf.reduce_mean(input_tensor=layer.output, keepdims=True)
I think the offending lines might be where these protected variables of
keras_model
are accessed directly:self.keras_model._losses = [] self.keras_model._per_input_losses = {}
Removing those allowed me to proceed with training without setting those empty lambdas.
When i removed these lines, I got the following error:
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 479, in _disallow_in_graph_mode
" this function with @tf.function.".format(task))
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.
@mayurmahurkar Add tf.compat.v1.disable_eager_execution()
after import tensorflow as tf
There is an issue however when you remove these lines:
self.keras_model._losses = []
self.keras_model._per_input_losses = {}
They were used to prevent duplicated losses. If you remove these lines and do a multi-step training, the losses for the previous step won't be cleared and you'll end up with x2 losses.
This is OK as long as you do not need to change the learning rate. Any hint to clear losses for multi-step training ?
Thanks
@lovehell I have this issue. Did you solve it?
@Behnam72 I didn't. However as far as I remember it does not corrupt your training, it only displays wrong losses
@lovehell thanks for the answer. I'd appreciate it if you can answer this:
These are my losses for two epochs (each ran separately with model.train):
epoch 1/1 100/100 [==============================] - 69s 626ms/step - batch: 49.5000 - size: 8.0000 - loss: 1.1028 - rpn_class_loss: 0.0173 - rpn_bbox_loss: 0.3368 - mrcnn_class_loss: 0.2695 - mrcnn_bbox_loss: 0.2328 - mrcnn_mask_loss: 0.2465 - val_loss: 1.7118 - val_rpn_class_loss: 0.0155 - val_rpn_bbox_loss: 0.6753 - val_mrcnn_class_loss: 0.3638 - val_mrcnn_bbox_loss: 0.3188 - val_mrcnn_mask_loss: 0.3385
epoch 2/2 100/100 [==============================] - 34s 230ms/step - batch: 49.5000 - size: 8.0000 - loss: 0.4404 - rpn_class_loss: 0.0062 - rpn_bbox_loss: 0.0626 - mrcnn_class_loss: 0.0484 - mrcnn_bbox_loss: 0.0330 - mrcnn_mask_loss: 0.0699 - val_loss: 3.1889 - val_rpn_class_loss: 0.0167 - val_rpn_bbox_loss: 0.7603 - val_mrcnn_class_loss: 0.2439 - val_mrcnn_bbox_loss: 0.2668 - val_mrcnn_mask_loss: 0.3067
For the second epoch, the sum of 5 losses in both training and validation is 1/2 of the "loss" and "val_loss". Is this only because i did not empty losses? if so, then why are the 5 losses okay? because we had these two lines in TF1.x:
self.keras_model._losses = []
self.keras_model._per_input_losses = {}
They empty both losses and per input losses. Does this mean the per input losses are also double now?
https://github.com/matterport/Mask_RCNN/pull/1896/files#diff-312c7e001d14bbb7ce5f8978f7b04cc3R2171
Hey,
I am facing the same error ,could you tell me how did you solve it?
I think I found a solution or workaround... change
# First, clear previously set losses to avoid duplication
self.keras_model._losses = []
self.keras_model._per_input_losses = {}
to
# First, clear previously set losses to avoid duplication
try:
self.keras_model._losses.clear()
except AttributeError:
pass
try:
self.keras_model._per_input_losses.clear()
except AttributeError:
pass
and also change a few lines afterwards from:
for name in loss_names:
layer = self.keras_model.get_layer(name)
if layer.output in self.keras_model.losses:
continue
loss = (
tf.reduce_mean(layer.output, keepdims=True)
* self.config.LOSS_WEIGHTS.get(name, 1.))
self.keras_model.add_loss(loss)
to
existing_layer_names = []
for name in loss_names:
layer = self.keras_model.get_layer(name)
if layer is None or name in existing_layer_names:
continue
existing_layer_names.append(name)
loss = (tf.reduce_mean(layer.output, keepdims=True)
* self.config.LOSS_WEIGHTS.get(name, 1.))
self.keras_model.add_loss(loss)
as well as
self.keras_model.metrics_tensors.append(loss)
to
self.keras_model.add_metric(loss, name=name, aggregation='mean')
as well as
self.keras_model.metrics_tensors.append(loss)
toself.keras_model.add_metric(loss, name=name, aggregation='mean')
After making this change, I get an error of "unhashable type : ListWrapper". Not sure how to proceed after this.
@sampath9875 Did you end up finding a working solution?
@sampath9875 Did you end up finding a working solution?
No. I decided to try Detectron's version of Mask RCNN.
@sampath9875 Did you end up finding a working solution?
No. I decided to try Detectron's version of Mask RCNN.
Does this work on Apple Silicon?
I believe it should provided all the required packages are installed. Detectron2 is built on PyTorch. And it also requires a whole list of additional packages.
I think the offending lines might be where these protected variables of
keras_model
are accessed directly:self.keras_model._losses = [] self.keras_model._per_input_losses = {}
Removing those allowed me to proceed with training without setting those empty lambdas.
It works perfectly. Thanks a lot!
Python 3.6 TensorFlow: 2.1.0rc0 Keras: 2.2.4-tf
After start training: