Perfoming way worse than EffnetV1

macsunmood commented 3 years ago

The issue I have is that, for example, with the same config EffnetV2_B0 performs way worse then original EfficentnetB0 from TF.applications (~0.70 acc vs. ~0.90). Any quick/obvious reason for that or is more detailed info needed?

macsunmood commented 3 years ago

Also, I get the following warning at the start of V2 traning, could this be assosiated with the issue?

Epoch 1/30

Epoch 00001: LearningRateScheduler reducing learning rate to 0.0010000000474974513.
WARNING:tensorflow:AutoGraph could not transform <function Model.make_train_function.<locals>.train_function at 0x7f772b2fff28> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: closure mismatch, requested ('self', 'step_function'), but source function had ()
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <function Model.make_train_function.<locals>.train_function at 0x7f772b2fff28> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: closure mismatch, requested ('self', 'step_function'), but source function had ()
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert

macsunmood commented 3 years ago

UPD: I just tried an equivalent model from https://github.com/sebastian-sz/efficientnet-v2-keras and it performs just well. I'm doing transfer learning and using include_top=False parameter for it. In leondgarse/keras_efficientnet_v2 I'm using num_classes=0 for this purpose.

Still wonder what it has to do with, but to me it looks like there are possibly some bugs with the current implementation.

leondgarse commented 3 years ago

My bad, it's because I accidentally changed a layer name, makes loading skipped on that layer... Sorry for that.. You may try again by upgrade pip install -U keras-efficientnet-v2. Speaking of EfficentnetB0 from TF.application, it has a Rescaling and Normalization layer on the top, so we may use value in [0, 255] as inputs, but [0, 1] values for efficientnetV2 models.

macsunmood commented 3 years ago

My bad, it's because I accidentally changed a layer name, makes loading skipped on that layer... Sorry for that.. You may try again by upgrade pip install -U keras-efficientnet-v2. Speaking of EfficentnetB0 from TF.application, it has a Rescaling and Normalization layer on the top, so we may use value in [0, 255] as inputs, but [0, 1] values for efficientnetV2 models.

The maintainer of the https://github.com/sebastian-sz/efficientnet-v2-keras implementation says that, quote:

The models expect image values in range -1:+1.

You say the v2 models expect values in range [0, 1]. Which one is true?

Also, I still have the WARNING:tensorflow:AutoGraph could not transform <function Model.make_train_function issue :(. Maybe it's not related to your implementation and has to do with something on my side..

leondgarse commented 3 years ago

Yes, yes, it's [-1, 1], he's right! I made some mistake in my testing, ran tf.expand_dims(imm / 127 - 127.5, 0) as input, which results bad, makes me thought it should be [0, 1]... I'm using TF 2.6.0 in my projects, and didn't meet that AutoGraph warning. Maybe version related? Both sebastian-sz and mine using only typical keras layers, and return a typical keras model. Maybe the optimizer?

macsunmood commented 3 years ago

I see you've added include_preprocessing parameter, which is great, but you're using keras.layers.Rescaling(1.0 / 255.0) whereas at https://www.tensorflow.org/api_docs/python/tf/keras/layers/Rescaling it's said that in order to rescale to [-1, 1] range one should pass the following: scale=1./127.5, offset=-1.

As far as I understand, your version is not quite correct and may negatively affect the accuracy performance.

UPD: I've just checked it and it doesn't seem to negatively affect the accuracy as I don't see any sensible difference between the two each trained on 20 epochs. Could this be the case? And if so, why ?

UPD 2: Since Rescaling (and Normalization, I guess) keras Layer went out of 'experimental' subgroup only in the latest TensorFlow releases, I suggest adding support for earlier versions as well, for compability reasons.

macsunmood commented 3 years ago

Regarding AutoGraph, I think it indeed can be TF version related, because I'm using server which currently only supports max v.2.4.1..

Also, seems like there's new bug after the latest release update (wasn't a thing before):

leondgarse commented 3 years ago

The B0 issue is because I accidentally deleted b0 in V2_BLOCK_CONFIGS... Fixed.
Using keras.layers.experimental.preprocessing.Rescaling and keras.layers.experimental.preprocessing.Normalization now.

Rescaling and Normalization value are from keras.applications.EfficientNetBx:

mm = keras.applications.EfficientNetB6()
print(mm.layers[1].scale, mm.layers[1].offset)
# 0.00392156862745098 0.0

Rescaling convert value [0, 255] -> [0, 1]. Normalization [0, 1] -> ~[-1, 1], by its mean and variance.

aa = tf.convert_to_tensor([[0, 0, 0], [127.5, 127.5, 127.5], [255, 255, 255]])
bb = keras.layers.experimental.preprocessing.Rescaling(1 / 255.0)(aa)
print(bb)
# [[0., 0., 0.], [0.5 0.5 0.5], [1., 1., 1.]]
cc = keras.layers.experimental.preprocessing.Normalization(mean=[0.485, 0.456, 0.406], variance=[0.229, 0.224, 0.225], axis=-1)(bb)
print(cc)
# [[-1.0135006  -0.9634758  -0.8559232 ] [ 0.03134535  0.09296697  0.19816943] [ 1.0761913   1.1494098   1.252262  ]]

It's not exactly [-1, 1]. Actually it's the PyTorch way. Also, the original EfficientNetBx v1 use padding like PyTorch.

About model accuracy. As the weights are loaded form official publication, the converted accuracy is confirmed here convert_effnetv2_model.py#L869. Basically like this:

fake_input = tf.random.uniform([2, 224, 224, 3])
orign_out = orign_model(fake_input)
converted_out = converted_model(fake_input)
test_result = np.allclose(orign_out.numpy(), converted_out.numpy())
print('>>>> Allclose:', test_result)
# >>>> Allclose: True

leondgarse commented 3 years ago

I think TF 2.4.1 should be alright. Here's my testing result on Colab efficientnetV2_basic_test.ipynb , as my local cuda version cannot match TF 2.4.1. Just a simple V2B0 + adam on cifar10. Selection_392

macsunmood commented 3 years ago

Using keras.layers.experimental.preprocessing.Rescaling and keras.layers.experimental.preprocessing.Normalization now.

Maybe even better, for everlasting compatibility: https://github.com/leondgarse/keras_efficientnet_v2/pull/9

leondgarse commented 3 years ago

Ya, I thought about that, but getting lazy. lol

leondgarse / keras_efficientnet_v2

Perfoming way worse than EffnetV1 #8