keras-team / tf-keras

The TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
Apache License 2.0
60 stars 28 forks source link

InvalidArgumentError: Value for attr 'dtype' of bfloat16 is not in the list of allowed values: uint8, int32, int64, half, float, double #278

Closed innat closed 1 year ago

innat commented 1 year ago

System information.

Describe the problem.

With mixed precisoin technique in TPU, the following random agumentation layers from keras raised errorr.

InvalidArgumentError: Value for attr 'dtype' of bfloat16 is not in the list of allowed values: uint8, int32, int64, half, float, double

Contributing.

Standalone code to reproduce the issue.

from tensorflow.keras.layers import RandomZoom
from tensorflow.keras.layers import RandomRotation
from tensorflow.keras.layers import RandomCrop

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.TPUStrategy(resolver)
keras.mixed_precision.set_global_policy("mixed_bfloat16") 

# Some data in batch 
# Pass to Random*

Source code / logs.

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

johnsnow9964 commented 1 year ago

can we use float32 instead of bfloat16 in keras.mixed_precision.set_global_policy("mixed_bfloat16") ?

innat commented 1 year ago

Why I would do that if I want to enable mixed precision in TPU?

johnsnow9964 commented 1 year ago

Why I would do that if I want to enable mixed precision in TPU?

you're right. in that case can we try converting data before and after passing it to the data augmentation layers?

innat commented 1 year ago

Why I would do that if I want to enable mixed precision in TPU?

you're right. in that case can we try converting data before and after passing it to the data augmentation layers?

IMO, not a good design, this should be supported by the API.

SuryanarayanaY commented 1 year ago

Hi @innat ,

Since you mentioned TF version 2.4.1v, Can you confirm whether the error persists in latest versions ? I tried the code snippet with TF2.11v and there is no such error arises.Please refer to attached gist-2.11v.Can you confirm whether provided code snippet by you is enough to replicate the mentioned behaviour?Or shall we need some more data?

I tried with TF2.4.1 but it seems the path is not correct as per gist-2.4.1 and hence not exactly replicated the reported behaviour.

However since 2.4.1 version is not actively supported and also there seems no problem with TF2.11v there is very little chance(almost nill) that this will be cherry picked for 2.4.1. So my request you to please confirm whether the reported issue persists in latest versions ?

innat commented 1 year ago

@SuryanarayanaY Thanks for the response. I tried to start TPU on colab but each time it shows me No backend with TPU available.. However, I could test the code on kaggle TPU but it still provides TF 2.4.1. Here is the gist that I ran on TPU (TF 2.4.1).

SuryanarayanaY commented 1 year ago

Hi @innat ,

Thanks for confirmation and the repro code.I tested the code with tf2.4.1 version I couldn't found any such error in my gist.

I have also tested the code with TF2.11v and code executes fine without error.Please refer to attached gist-tf2.11v.

Whether this code snippet enough to replicate the error ? If so then it is probably resolved already as my colab gist is success in running the code without error.Not sure of Kaggle environment though.Could you please check and confirm ?

Thank you!

innat commented 1 year ago

@SuryanarayanaY Thanks for checking. In your gist, the code is incomplete, that's why you didn't face the reported error. I'm placing the full code here.

from packaging.version import parse
import tensorflow as tf
from tensorflow import keras

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.TPUStrategy(resolver)
print("All devices: ", tf.config.list_logical_devices('TPU'))
tf.keras.mixed_precision.set_global_policy("mixed_bfloat16")
tf.__version__

if parse(tf.__version__) < parse('2.6.2'):
    from tensorflow.keras.layers.experimental.preprocessing import Resizing
    from tensorflow.keras.layers.experimental.preprocessing import RandomFlip
    from tensorflow.keras.layers.experimental.preprocessing import RandomZoom
    from tensorflow.keras.layers.experimental.preprocessing import RandomRotation
else:
    from tensorflow.keras.layers import Resizing
    from tensorflow.keras.layers import RandomFlip
    from tensorflow.keras.layers import RandomZoom
    from tensorflow.keras.layers import RandomRotation

INP_SIZE = 224

# Preprocessing
data_preprocessing = keras.Sequential(
    [
        Resizing(
            *[INP_SIZE] * 2, 
            interpolation="bilinear"
        ),
    ], 
    name='PreprocessingLayers'
)

# Augmentation
data_augmentations = keras.Sequential(
    [
        RandomFlip("horizontal"),
        # Doesn't work on TPU with mixed precision (TF 2.4.1)
        # ticket: https://github.com/keras-team/tf-keras/issues/278
        RandomZoom(0.2, fill_mode='reflect'),
        RandomRotation(0.1, fill_mode='reflect'),
    ],
    name='AugmentationLayers'
)
# Define Sequential model with 3 layers
model = keras.Sequential(
    [
        keras.layers.InputLayer(input_shape=tuple([INP_SIZE] * 2) + (3,)),
        data_preprocessing,
        data_augmentations,
        keras.applications.EfficientNetB0(include_top=False, pooling='avg')
    ]
)
# Call model on a test input
x = tf.ones((1, INP_SIZE, INP_SIZE, 3))
y = model(x)
y.shape
SuryanarayanaY commented 1 year ago

Hi @innat ,

Iam getting different error as below and also refer attached gist.Could you please provide exact code snippet to replicate the reported behaviour.

NotImplementedError: Cannot convert a symbolic Tensor (AugmentationLayers/random_zoom/zoom_matrix/strided_slice:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

innat commented 1 year ago

@SuryanarayanaY

Sorry for the inconvenience. The model build should be as follows. Please re-check.

with strategy.scope():
    model = ...
innat commented 1 year ago

@SuryanarayanaY I've run the gist on colab (with tf 2.11 on TPU). The error is reported. Please check

SuryanarayanaY commented 1 year ago

Hi @innat , I am able to replicate the issue with Tf2.11v.Please refer to attached gist.

SuryanarayanaY commented 1 year ago

Hi @innat ,

It seems tf.random_zoom is not supported on TPU which is the reason for the error.Please refer to source for list of Tensorflow Ops that supported on TPU. tf.random_zoom is not in the list.

innat commented 1 year ago

Thanks for the hint. I didn't check all random_*.

(shallow cmt), I think there is not api like tf.random_zoom, but it's contructed. WDYT?

google-ml-butler[bot] commented 1 year ago

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

innat commented 1 year ago

.

SuryanarayanaY commented 1 year ago

Hi @innat ,

Sorry for being not specific.Most of the Image augmentation layers one way or other uses Ops from tf.image module or other TF ops internally. If these Ops were not supported on TPU then the layer also likely to fail on TPU. I haven't specified complete path earlier but nevertheless there is no Op random_zoom even in tf.image module but it is a Constructed Layer only that not supported on TPU.

As per the source attached above comment-1446771123 None of the layers of tf.keras.layers.Random* included in the TPU supported layers.

innat commented 1 year ago

@SuryanarayanaY , Actually these augmentaiton layers works on TPU without any issue until the mixed precisoin (bfloat16) is enabled. So, these ops are supported on TPU but not with mixed precisoin. WDYT?

rchao commented 1 year ago

Cc'ing @reedwm - Reed do you have a suggestion how to move forward with this?

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No