keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.26k stars 19.38k forks source link

Cant predict after training on TPU. #19905

Open h4ck4l1 opened 1 week ago

h4ck4l1 commented 1 week ago

I get this error when trying to predict on a tfrecord dataset

Error Message:


---------------------------------------------------------------------------
OperatorNotAllowedInGraphError            Traceback (most recent call last)
[<ipython-input-13-ac225234dc28>](https://localhost:8080/#) in <cell line: 3>()
      1 TEST_STEPS = 1674896
      2 
----> 3 model.predict(test_ds,steps=TEST_STEPS,use_multiprocessing=True)

1 frames
[/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/polymorphic_function/autograph_util.py](https://localhost:8080/#) in autograph_handler(*args, **kwargs)
     50     except Exception as e:  # pylint:disable=broad-except
     51       if hasattr(e, "ag_error_metadata"):
---> 52         raise e.ag_error_metadata.to_exception(e)
     53       else:
     54         raise

OperatorNotAllowedInGraphError: in user code:

    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2446, in predict_function  *
        outputs = step_function(self, iterator)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2425, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2413, in run_step
        outputs = model.predict_step(data)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2381, in predict_step
        return self(x, training=False)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None

    OperatorNotAllowedInGraphError: Exception encountered when calling layer 'weighted_attention_transformer' (type WeightedAttentionTransformer).

    in user code:

        File "<ipython-input-9-b9994ab6ce04>", line 85, in call  *
            smiles,tokens = inputs

        OperatorNotAllowedInGraphError: Iterating over a symbolic `tf.Tensor` is not allowed. You can attempt the following resolutions to the problem: If you are running in Graph mode, use Eager execution mode or decorate this function with @tf.function. If you are using AutoGraph, you can try decorating this function with @tf.function. If that does not work, then you may be using an unsupported feature or your source code may not be visible to AutoGraph. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/g3doc/reference/limitations.md#access-to-source-code for more information.

    Call arguments received by layer 'weighted_attention_transformer' (type WeightedAttentionTransformer):
      • inputs=tf.Tensor(shape=(128,), dtype=float32)

My Inputs are of shape ([BATCH_SIZE,1024],[BATCH_SIZE,142]) and this is notebook

Explanation:

I am using a encoder-decoder transformer. I have checked for any shape mismatch issues but none came up.

Please and Thankyou for any valuable feedback

h4ck4l1 commented 1 week ago

If you require me to make the dataset public if you want to run in your environments if it's an unique issue, I will make it public.

edit: made it public.

edit2: All you need is to give two secrets one with "username" as key and its value being your kaggle username, and key "key" with the unique key you get from kaggle.json and run it, the notobook will stop at predicting.

h4ck4l1 commented 1 week ago

Wait I solved it but not exactly because I ended up with more doubts. I zipped my test dataset with dummy Y values.Then the predict works properly why is that?.

This below does not work at all. It gives me improper shape warning.

BATCH_SIZE = 256
TEST_STEPS = len(test.tfrecord)//BATCH_SIZE

test_ds = (
    tf.data.TFRecordDataset("belka-tfrecords/test_10.tfrecord",compression_type="GZIP",num_parallel_reads=AUTO)
    .map(test_belka_example,num_parallel_calls=AUTO)
)

X_test,y_test = total_test_ds.take(1).get_single_element()
print("test molecule smiles: ",X_test[0].shape) # (256, 1024)
print("test molecule tokens: ",X_test[1].shape) # (256, 142)

model.predict(test_ds,steps=TEST_STEPS,use_multiprocessing=True)

This down below works...

BATCH_SIZE = 256
TEST_STEPS = len(test.tfrecord)//BATCH_SIZE

test_ds = (
    tf.data.TFRecordDataset("belka-tfrecords/test_10.tfrecord",compression_type="GZIP",num_parallel_reads=AUTO)
    .map(test_belka_example,num_parallel_calls=AUTO)
)

dummy_y = (
    tf.data.Dataset.from_tensor_slices(tf.random.uniform(shape=[1674896],minval=0,maxval=2,dtype=tf.int32))
)

total_test_ds = tf.data.Dataset.zip((test_ds,dummy_y)).batch(256,num_parallel_calls=AUTO)

X_test,y_test = total_test_ds.take(1).get_single_element()
print("test molecule smiles: ",X_test[0].shape) # (256,1024)
print("test molecule tokens: ",X_test[1].shape) # (256,142)
print("test y: ",y.shape)  # (256,)

model.predict(test_ds,steps=TEST_STEPS,use_multiprocessing=True)

Why does a test dataset without y values not work directly?.