tensorflow / addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
Apache License 2.0
1.69k stars 611 forks source link

Error caused by SigmoidFocalCrossEntropy with kernel regularizer #2349

Open oO0oO0oO0o0o00 opened 3 years ago

oO0oO0oO0o0o00 commented 3 years ago

System information

Describe the bug I have L2 kernel regularizer set for some of the (keras) layers. tfa.losses.SigmoidFocalCrossEntropy() was used as the loss function. After the model being built and compiled, model.fit was called and the following exception occurred:

ValueError: Shapes must be equal rank, but are 1 and 0 From merging shape 0 with other shapes. for '{{node AddN}} = AddN[N=2, T=DT_FLOAT](sigmoid_focal_crossentropy/weighted_loss/Mul, d1_7/kernel/Regularizer/add)' with input shapes: [?], [].

The full stack trace is too long and would be appended at the tail.

Code to reproduce the issue

Run the following code:

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense
import tensorflow_addons as tfa

model = keras.Sequential([
    Dense(5, activation='relu', kernel_regularizer='l2', name='d1', input_shape=(12,)),
    Dense(5, activation='softmax', name='dout')
])
model.compile(optimizer='adam', loss=tfa.losses.SigmoidFocalCrossEntropy(), metrics=['accuracy'])
model.summary()
# random data with desired shape was used to help with faster reproduction
model.fit(np.random.randn(64, 12), tf.one_hot(np.random.randint(0,5,64),5))

And the above mentioned exception popped out. By removing kernel_regularizer='l2', the exception was gone and the training progress bar appeared as expected.

Other info / logs

Full stack trace: (You may want to skip it)

ValueError                                Traceback (most recent call last)
<ipython-input-3-cd3ce786e484> in <module>
     11 model.compile(optimizer='adam', loss=tfa.losses.SigmoidFocalCrossEntropy(), metrics=['accuracy'])
     12 model.summary()
---> 13 model.fit(np.random.randn(64, 12), np.random.randint(0,5,64))

~/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
     64   def _method_wrapper(self, *args, **kwargs):
     65     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
---> 66       return method(self, *args, **kwargs)
     67 
     68     # Running inside `run_distribute_coordinator` already.

~/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
    846                 batch_size=batch_size):
    847               callbacks.on_train_batch_begin(step)
--> 848               tmp_logs = train_function(iterator)
    849               # Catch OutOfRangeError for Datasets of unknown size.
    850               # This blocks until the batch has finished executing.

~/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    578         xla_context.Exit()
    579     else:
--> 580       result = self._call(*args, **kwds)
    581 
    582     if tracing_count == self._get_tracing_count():

~/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    625       # This is the first call of __call__, so we have to initialize.
    626       initializers = []
--> 627       self._initialize(args, kwds, add_initializers_to=initializers)
    628     finally:
    629       # At this point we know that the initialization is complete (or less

~/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in _initialize(self, args, kwds, add_initializers_to)
    503     self._graph_deleter = FunctionDeleter(self._lifted_initializer_graph)
    504     self._concrete_stateful_fn = (
--> 505         self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
    506             *args, **kwds))
    507 

~/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
   2444       args, kwargs = None, None
   2445     with self._lock:
-> 2446       graph_function, _, _ = self._maybe_define_function(args, kwargs)
   2447     return graph_function
   2448 

~/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
   2775 
   2776       self._function_cache.missed.add(call_context_key)
-> 2777       graph_function = self._create_graph_function(args, kwargs)
   2778       self._function_cache.primary[cache_key] = graph_function
   2779       return graph_function, args, kwargs

~/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   2655     arg_names = base_arg_names + missing_arg_names
   2656     graph_function = ConcreteFunction(
-> 2657         func_graph_module.func_graph_from_py_func(
   2658             self._name,
   2659             self._python_function,

~/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    979         _, original_func = tf_decorator.unwrap(python_func)
    980 
--> 981       func_outputs = python_func(*func_args, **func_kwargs)
    982 
    983       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

~/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
    439         # __wrapped__ allows AutoGraph to swap in a converted function. We give
    440         # the function a weak reference to itself to avoid a reference cycle.
--> 441         return weak_wrapped_fn().__wrapped__(*args, **kwds)
    442     weak_wrapped_fn = weakref.ref(wrapped_fn)
    443 

~/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    966           except Exception as e:  # pylint:disable=broad-except
    967             if hasattr(e, "ag_error_metadata"):
--> 968               raise e.ag_error_metadata.to_exception(e)
    969             else:
    970               raise

ValueError: in user code:

    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:571 train_function  *
        outputs = self.distribute_strategy.run(
    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:951 run  **
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
        return fn(*args, **kwargs)
    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:532 train_step  **
        loss = self.compiled_loss(
    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/keras/engine/compile_utils.py:238 __call__
        total_loss_metric_value = math_ops.add_n(loss_metric_values)
    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:180 wrapper
        return target(*args, **kwargs)
    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py:3239 add_n
        return gen_math_ops.add_n(inputs, name=name)
    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/ops/gen_math_ops.py:419 add_n
        _, _, _op, _outputs = _op_def_library._apply_op_helper(
    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py:742 _apply_op_helper
        op = g._create_op_internal(op_type_name, inputs, dtypes=None,
    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py:593 _create_op_internal
        return super(FuncGraph, self)._create_op_internal(  # pylint: disable=protected-access
    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:3319 _create_op_internal
        ret = Operation(
    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1816 __init__
        self._c_op = _create_c_op(self._graph, node_def, inputs,
    /home/omnisky/anaconda3/envs/tf2d1/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1657 _create_c_op
        raise ValueError(str(e))

    ValueError: Shapes must be equal rank, but are 1 and 0
        From merging shape 0 with other shapes. for '{{node AddN}} = AddN[N=2, T=DT_FLOAT](sigmoid_focal_crossentropy/weighted_loss/Mul, d1/kernel/Regularizer/add)' with input shapes: [32], [].

Full stack trace compiled with run_eagerly=True may be provided if requested.

Thanks~

WindQAQ commented 3 years ago

Sorry for the late reply. You have to use tfa.losses.SigmoidFocalCrossEntropy(reduction=tf.keras.losses.Reduction.AUTO) to reduce the loss to scalar. I'm not sure why we make it default to NONE. @AakashKumarNain Could you confirm that it's an issue or not?

bhack commented 3 years ago

We have a little bit of Doc here on the reductuion parameter: https://github.com/tensorflow/models/blob/master/official/vision/keras_cv/losses/focal_loss.py#L37

AakashKumarNain commented 3 years ago

@WindQAQ Yes, that needs to be changed to AUTO and we need to make a few other changes as well. But I won't be able to fix it before next week.

bhack commented 3 years ago

I put this in the ecosystem review in the meantime cause I want to check how we want to handle this duplicated but not strictly aligned implementations.

AakashKumarNain commented 3 years ago

Agreed

oO0oO0oO0o0o00 commented 3 years ago

Sorry for the late reply. You have to use tfa.losses.SigmoidFocalCrossEntropy(reduction=tf.keras.losses.Reduction.AUTO) to reduce the loss to scalar. I'm not sure why we make it default to NONE. @AakashKumarNain Could you confirm that it's an issue or not?

Thanks it works. QAQ

ravinderkhatri commented 2 years ago

I am facing the same issue, however, using tfa.losses.SigmoidFocalCrossEntropy(reduction=tf.keras.losses.Reduction.AUTO) worked like a charm.

https://github.com/tensorflow/models/blob/master/official/vision/keras_cv/losses/focal_loss.py#L37

This link is not working. Can you please share the updated link?

bhack commented 2 years ago

@ravinderkhatri Keras-cv Is under refactoring.

We have a PR at https://github.com/tensorflow/addons/pull/2422

kynnemall commented 1 year ago

Having the same issue but setting reduction=tf.keras.losses.Reduction.AUTO fixed it. Surprised this isn't the default in tensorflow-addons

bhack commented 1 year ago

We have already official upstream APIs now: https://github.com/keras-team/keras-cv/issues/1117