Bug: [TFHub BERT/Small BERT] InvalidArgumentError on not setting random seed for determinism, even if set

piri-p commented 2 years ago

What happened?

I was trying to use fine-tune BERT from TFHub for text classification task. The run with trainable=True crashes with

InvalidArgumentError: When determinism is enabled, random ops must have a seed specified.

[[{{node dropout/dropout/random_uniform/RandomUniform}}]] [Op:__inference_train_function_XXXX]

I already try setting the seeds and also "not" enable determinism mode, but no avails. This fails for other BERT models on TFHub as well.

It does not crash with trainable=False.

Relevant code

# set seed
random_seed = 12
np.random.seed(random_seed)
random.seed(random_seed)
tf.experimental.numpy.random.seed(random_seed)
tf.keras.utils.set_random_seed(random_seed)
tf.random.set_seed(random_seed)
tf.config.experimental.enable_op_determinism() # even if removing this, still crash

# model code

        pp_source = "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3"
        model_source = "https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/2"

        input = Input(
            shape=(), 
            dtype=tf.string
            )

        pp_layer = hub.KerasLayer(pp_source)

        x = pp_layer(input)

        hub_layer = hub.KerasLayer(
            model_source, 
            trainable=True # here would cause error, False is ok
            )

        x = hub_layer(x)["pooled_output"]

        y = Dense(
                num_class, 
                activation="softmax"
                )(x)

        model = Model(
            inputs=[input],
            outputs=[y]
            )

        model.compile(
            loss="categorical_crossentropy", 
            optimizer=RMSprop(learning_rate=0.001),
            metrics=["categorical_accuracy"] 
            )

Relevant log output

Traceback (most recent call last):
  File "transformer_model_builder.py", line 318, in <module>
    model.fit(
  File "/home/user/miniconda3/envs/conda-env/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/user/miniconda3/envs/conda-env/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

When determinism is enabled, random ops must have a seed specified.
     [[{{node dropout/dropout/random_uniform/RandomUniform}}]] [Op:__inference_train_function_23160]

tensorflow_hub Version

0.12.0 (latest stable release)

TensorFlow Version

other (please specify)

Other libraries

tensorflow==2.9.0 tensorflow-text==2.9.0

Python 3.9 under Conda 4.12.0 all Tensorflow packages installed via pip

Python Version

3.x

OS

Linux

singhniraj08 commented 2 years ago

@piriyacoco,

Instead of using trainable = True, load the model with trainable = False, unfreeze the model and set the bottom layers to un-trainable as mentioned here. You can also refer here for fine tuning BERT model.

Please let us know if this helps. Thank you!

piri-p commented 2 years ago

[Update] Dear @singhniraj08 thank you again for the swift response! I followed the instruction, switching trainable to False and putting the following snippet before re-compiling the model, but it complains exactly the same error.

        hub_layer = hub.KerasLayer(
            model_source, 
            trainable=False # per https://github.com/tensorflow/hub/issues/862
            )
...
...
...

        model.trainable = True

        # Fine-tune starting from BERT encoder layer (Layer 2)
        fine_tune_at = 2

        # Freeze all the layers before the `fine_tune_at` layer
        for layer in model.layers[:fine_tune_at]:
            layer.trainable = False

        for layer in model.layers[fine_tune_at:]:
            layer.trainable = True

singhniraj08 commented 2 years ago

@piriyacoco,

I tried the same code with below TF versions and I cannot reproduce the error. I am loading the model with trainable = True parameter. Please check the gist here. Thank you!

tensorflow==2.9.0 tensorflow-text==2.9.0 tensorflow-hub==0.12.0 Python==3.7.3

piri-p commented 2 years ago

Dear @singhniraj08

Thank you for checking! I just made a gist here for error reproducibility. The error occurs when fitting the model.

tensorflow==2.9.0 tensorflow-text==2.9.0 tensorflow-hub==0.12.0 Python==3.7.13

gaikwadrahul8 commented 2 years ago

@piriyacoco

I have tried same code to reproduce without error and I'm able to reproduce it on Google Colab and I have tried below versions, it's working fine without error so kindly please try below versions for required libraries and I hope it will also work for you, even for your reference I've added Gist file here

I have added learning resource for you also here for NLP use case with Tensorflow, I hope it will help you in your learning or NLP use case

Here is versions details to install libraries :

tensorflow==2.8.0
tensorflow-text==2.8.2
tensorflow-hub==0.10.0
Python==3.7.15

If issue still persists please let us know ? Thank you!

piri-p commented 2 years ago

Hi @gaikwadrahul8

Thank you for your response! So I think you are using the TF version 2.8 right? Do you think it is the issue with TF version 2.9 I used, or what exactly is the problem. I am just trying to understand here :)

gaikwadrahul8 commented 2 years ago

@piriyacoco

I noticed there is version compatibility issue between TF Hub and TF version with NLP Pre-trained models on TF Hub so as work around I would suggest you to go with Tensorflow==2.8.0 for your use case and everything will work fine for you

Could you please confirm if this issue is resolved for you ?Please feel free to close the issue if it is resolved ?

Thank you!

gaikwadrahul8 commented 1 year ago

Hi, @piriyacoco

Update: While using BERT preprocessing from TFHub, Tensorflow and tensorflow_text versions should be same so please make sure that installed both versions are same. It happens because you're using latest version for tensorflow_text but you're using other versions for python and tensorflow but there is internal dependancy with versions for Tensorflow and tensorflow_text which should be same.

Please refer similar issue over stackoverflow here if you have any further questions or you need any further assistance please re-open this issue ?

Closing this issue due to lack of recent activity for couple of weeks. Please feel free to reopen the issue with more details if the problem still persists after trying above workaround. Thank you!

tensorflow / hub