Inconsistent implementations of RegNet between keras.applications and keras-cv

Are you willing to contribute?: Yes

Describe the problem: Classification head implementations in keras-cv and keras.applications are different.

From keras.applications: https://github.com/keras-team/keras/blob/fa5c889334c9821ebee1371417ba3bb02eec704d/keras/applications/regnet.py#L836-L854

def Head(num_classes=1000, name=None):
    """Implementation of classification head of RegNet.
    Args:
      num_classes: number of classes for Dense layer
      name: name prefix
    Returns:
      Output logits tensor.
    """
    if name is None:
        name = str(backend.get_uid("head"))

    def apply(x):
        x = layers.GlobalAveragePooling2D(name=name + "_head_gap")(x)
        x = layers.Dense(num_classes, name=name + "head_dense")(x)
        return x

    return apply

From keras-cv-models: https://github.com/keras-team/keras-cv/blob/1411bf2948195d7efa0ffa73b9f1f2a85caeae46/keras_cv/models/regnet.py#L655-L673

Both docstrings has the following:

"""Implementation of classification head of RegNet.
    Args:
      classes: number of classes for Dense layer
      name: name prefix
    Returns:
      Output logits tensor.
    """

According to the docstring we shall not pass classifier_activation to the Head(), when we pass softmax the model will not produce logits anymore.

Original implementation also has linear layer at the end: https://github.com/facebookresearch/pycls/blob/0ddcc2b25607c7144fd6c169d725033b81477223/pycls/models/anynet.py#L54-L73

class AnyHead(Module):
    """AnyNet head: optional conv, AvgPool, 1x1."""

    def __init__(self, w_in, head_width, num_classes):
        super(AnyHead, self).__init__()
        self.head_width = head_width
        if head_width > 0:
            self.conv = conv2d(w_in, head_width, 1)
            self.bn = norm2d(head_width)
            self.af = activation()
            w_in = head_width
        self.avg_pool = gap2d(w_in)
        self.fc = linear(w_in, num_classes, bias=True)

    def forward(self, x):
        x = self.af(self.bn(self.conv(x))) if self.head_width > 0 else x
        x = self.avg_pool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

In Keras repo I wrote some lines to test classifier activation of the pretrained models and excluded RegNet models: https://github.com/keras-team/keras/blob/fa5c889334c9821ebee1371417ba3bb02eec704d/keras/applications/applications_test.py#L194-L202

    @parameterized.parameters(MODEL_LIST)
    def test_application_classifier_activation(self, app, _):
        if "RegNet" in app.__name__:
            self.skipTest("RegNet models do not support classifier activation")
        model = app(
            weights=None, include_top=True, classifier_activation="softmax"
        )
        last_layer_act = model.layers[-1].activation.__name__
        self.assertEqual(last_layer_act, "softmax")

Question: Which one is correct? I expect both implementations to be same for the classification head.

keras-team / keras-cv

Inconsistent implementations of RegNet between keras.applications and keras-cv #1410