keras-team / tf-keras

The TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
Apache License 2.0
59 stars 28 forks source link

Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT #103

Open sky712345678 opened 2 years ago

sky712345678 commented 2 years ago

System information.

Describe the problem. (Continue the issue from tensorflow_issue_57052) I got a Type inference failed error when running tf.keras.Model.fit() in Tensorflow 2.9 and Keras 2.9. I didn't see this kind of error in version 2.8 with the identical code. Although the program didn't crash, I'm afraid that there will be some error in the trained model.

Describe the current behavior. Run tf.keras.Model.fit() and the error Type inference failed shows up.

Describe the expected behavior. The error shouldn't show up.

Contributing.

Standalone code to reproduce the issue. Link to notebook: https://drive.google.com/file/d/1k78lpGVthB7nthEkYgUs3JNJTuR79r5E/view?usp=sharing To reproduce:

  1. Open the notebook with Google Colab
  2. Run all cells
  3. View the runtime logs

Source code / logs.

2022-08-20 17:18:05.533157: W tensorflow/core/common_runtime/forward_type_inference.cc:231] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_BOOL
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_LEGACY_VARIANT
    }
  }
}

        while inferring type of node 'dice_loss/cond/output/_11'
sushreebarsa commented 2 years ago

@gadagashwini I was able to replicate the issue on colab, please find the gist here. Thank you!

gadagashwini commented 2 years ago

Hi @sky712345678, W tensorflow/core/common_runtime/forward_type_inference.cc:231] Type inference failed. is just a warning, you can safely ignore it. Given code executed without any error message. Thank you!

tgsmith61591 commented 1 year ago

@gadagashwini what's the point of a warning if the response is simply you can safely ignore it.? It's clearly there for a reason

google-ml-butler[bot] commented 1 year ago

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

sky712345678 commented 1 year ago

@gadagashwini can you talk a little bit more about the reason why we can safely ignore it? Thank you!

gowthamkpr commented 1 year ago

@sky712345678 This looks like an issue from tensorflow. Can you please create this issue in tensorflow/tensorflow. Thank you!!

foxik commented 1 year ago

@gowthamkpr Well, the problem was first reported in tensorflow as https://github.com/tensorflow/tensorflow/issues/57052, but the guys there told the reporter to instead post an issue here.

If you know any more details (why it is a TensorFlow issue), could you please provide more details that we can give to the TF guys?

google-ml-butler[bot] commented 1 year ago

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

foxik commented 1 year ago

@gowthamkpr The issue was originally a TF issue, but we were redirected to post it here. If you know any more details (why it is a TF issue and not Keras), could you please provide more details that we can give to the TF guys? Thanks!

zrx563010758 commented 1 year ago

Unsubscribe

---Original--- From: "Milan @.> Date: Sun, Oct 2, 2022 16:54 PM To: @.>; Cc: @.***>; Subject: Re: [keras-team/keras] Type inference failed. This indicates aninvalid graph that escaped type checking. Error message: INVALID_ARGUMENT(Issue keras-team/tf-keras#103)

@gowthamkpr Well, the problem was first reported in tensorflow as tensorflow/tensorflow#57052, but the guys there told the reporter to instead post an issue here.

If you know any more details (why it is a TensorFlow issue), could you please provide more details that we can give to the TF guys?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

fchollet commented 1 year ago

The issue is at the level of the dice_loss. Can you try producing a reproduction script that only involves the loss function? Maybe just try to backprop through the loss function and see what happens.

I think this should be reproducible without involving any Keras logic, at which point the TF folks will definitely look at it. But anyway, as said before, this is just a warning, not something critical. You can ignore it.

sky712345678 commented 1 year ago

Ok, I got it. Thank you!

I wasn't sure how to reproduce it only involving the loss function, this is my try: https://colab.research.google.com/drive/1qxamrOaOqfVANzMnN-u--Sue4iPtJCtf?usp=sharing image Running this Colab notebook, I didn't see the error message in runtime logs.

TuanHAnhVN commented 1 year ago

Ok, I got it. Thank you!

I wasn't sure how to reproduce it only involving the loss function, this is my try: https://colab.research.google.com/drive/1qxamrOaOqfVANzMnN-u--Sue4iPtJCtf?usp=sharing image Running this Colab notebook, I didn't see the error message in runtime logs.

Hi bro, Can you share the solution to over this issue? I have met the similar problem. Thank you so much

isohrab commented 1 year ago

I'm getting the same warning with TF 2.11 when I set mask_zero=True in the embedding layer.

alibahmanyar commented 1 year ago

+1; I'm also getting the same warning with TF 2.11 and setting mask_zero=True in the embedding layer while training on GPU. After the warning, the model keeps training and is then saved, but the saved model can't be loaded using keras.models.load_model. However, when I'm training on CPU (even with mask_zero=True) everything works fine and the warning doesn't show up; the model is trained, saved and can be loaded and used again without encountering any problem.

albertz commented 1 year ago

I'm getting something very similar but with pure TF 2.11 on Mac M1. So I really think this is a pure TF issue, and we should reopen the TF issue (https://github.com/tensorflow/tensorflow/issues/57052).

NinaCilliers commented 1 year ago

I have the same issue unfortunately. Currently running with mask_zero set to True and using CPU without issue.

cromicron commented 11 months ago

Hi @sky712345678, W tensorflow/core/common_runtime/forward_type_inference.cc:231] Type inference failed. is just a warning, you can safely ignore it. Given code executed without any error message. Thank you!

Nope - because the execution time 8-folds!

iconrnd commented 11 months ago

Hi, on TF 2.13.0 I get this warning as well when training a simple encoder-decoder EN-ES translation with LSTM accepting embedded strings with mask_zero=True:

2023-10-07 11:39:56.995271: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1: type_id: TFT_OPTIONAL args { type_id: TFT_PRODUCT args { type_id: TFT_TENSOR args { type_id: TFT_INT32 } } } is neither a subtype nor a supertype of the combined inputs preceding it: type_id: TFT_OPTIONAL args { type_id: TFT_PRODUCT args { type_id: TFT_TENSOR args { type_id: TFT_FLOAT } } }

for Tuple type infernce function 0
while inferring type of node 'cond_40/output/_23'

The model trains, but then when I wanted to use jit_compile=True the fit() breaks with:

2023-10-07 11:46:15.327751: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at xla_ops.cc:444 : INVALID_ARGUMENT: Trying to access resource 7590 located in device /job:localhost/replica:0/task:0/device:CPU:0 from device /job:localhost/replica:0/task:0/device:GPU:0

Someone on StackExchange suggested that this JIT failure might be linked to TF code creating something as INT32 instead of FLOAT32 and resulting in putting some variables in CPU, which seems to be linked to the above motioned error.

Frank-Schiro commented 2 months ago

Still getting this error has there been any update?

Epoch 1/20
2024-06-18 14:04:10.665333: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT32
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_FLOAT
    }
  }
}

    while inferring type of node 'cond_42/output/_24'
2024-06-18 14:04:10.835800: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:630] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
neevmanvar commented 1 month ago

I have solved a similar error in my code and here's how I did it.

I think the problem lies when using @tf.function or using any function with a condition while running tf graph. In my case during model.fit method.

Problem indicates that invalid graph escaped type checking. When using if-else statement in @tf.function code keras API converts if-else conditions into tf.cond (AutoGraph converts if-statement to tf.cond().) however, during model.fit() tensorflow gives a warning when using elif but if you want to avoid that error remove elif statements with normal if-else statements and I think that might solve this problem.

Implementation of function before error and it was used in loss function which was used in mode.compile and later model.fit method

import tensorflow as tf
class RescaleImage():
    def __init__(self) -> None:
        super().__init__()

    @tf.function
    def normalize(self, x:tf.Tensor, min_val: float=0.0, max_val: float=1.0)->tf.Tensor:
        min_val = tf.cast(min_val,tf.float32)
        max_val = tf.cast(max_val, tf.float32)
        if tf.reduce_max(x)>1.0 and tf.reduce_min(x)>=0.0:
            if min_val==0.0 and max_val==1.0:
                x = x/255.0
            elif min_val==-1.0 and max_val==1.0:
                x = (x - 127.5)/127.5
        elif tf.reduce_max(x)<=1.0 and tf.reduce_min(x)>=-1.0 and tf.reduce_min(x)<0.0:
            if min_val==0.0 and max_val==1.0:
                x = (x+1.0)/2.0
            elif min_val==0.0 and max_val==255.0:
                x = (x+1.0)*255.0/2.0

        elif tf.reduce_max(x)<=1.0 and tf.reduce_min(x)>=0.0:
            if min_val==-1.0 and max_val==1.0:
                x = (x-0.5)/0.5
            elif min_val==0.0 and max_val==255.0:
                x = x*255.0
        return x

    @tf.function
    def normalize_individual(self, x:tf.Tensor, min_val: float=0.0, max_val: float=1.0)->tf.Tensor:
        min_val = tf.cast(min_val,tf.float32)
        max_val = tf.cast(max_val, tf.float32)
        if tf.reduce_max(x)>1.0 and tf.reduce_min(x)>=0.0:
            factor = (max_val-min_val)/(tf.math.reduce_max(x)-tf.math.reduce_min(x))
            x = factor*(x - tf.math.reduce_min(x))+min_val

        elif tf.reduce_max(x)<=1.0 and tf.reduce_min(x)>=-1.0 and tf.reduce_min(x)<0.0:
            if min_val==0.0 and max_val==1.0:
                x = (x+1.0)/2.0
            elif min_val==0.0 and max_val==255.0:
                x = (x+1.0)*255.0/2.0

        elif tf.reduce_max(x)<=1.0 and tf.reduce_min(x)>=0.0:
            if min_val==-1.0 and max_val==1.0:
                x = (x-0.5)/0.5
            elif min_val==0.0 and max_val==255.0:
                x = x*255.0
        return x

Code after solving the error (using normal if statements):

import tensorflow as tf

class RescaleImage():
    def __init__(self) -> None:
        super().__init__()

    @tf.function
    def normalize(self, x:tf.Tensor, min_val: float=0.0, max_val: float=1.0)->tf.Tensor:
        min_val = tf.cast(min_val,tf.float32)
        max_val = tf.cast(max_val, tf.float32)
        if tf.reduce_max(x)>1.0 and tf.reduce_min(x)>=0.0:
            if min_val==0.0 and max_val==1.0:
                x = x/255.0
            if min_val==-1.0 and max_val==1.0:
                x = (x - 127.5)/127.5

        if tf.reduce_max(x)<=1.0 and tf.reduce_min(x)>=-1.0 and tf.reduce_min(x)<0.0:
            if min_val==0.0 and max_val==1.0:
                x = (x+1.0)/2.0
            if min_val==0.0 and max_val==255.0:
                x = (x+1.0)*255.0/2.0

        if tf.reduce_max(x)<=1.0 and tf.reduce_min(x)>=0.0:
            if min_val==-1.0 and max_val==1.0:
                x = (x-0.5)/0.5
            if min_val==0.0 and max_val==255.0:
                x = x*255.0

        return x

    @tf.function
    def normalize_individual(self, x:tf.Tensor, min_val: float=0.0, max_val: float=1.0)->tf.Tensor:
        min_val = tf.cast(min_val,tf.float32)
        max_val = tf.cast(max_val, tf.float32)
        if tf.reduce_max(x)>1.0 and tf.reduce_min(x)>=0.0:
            factor = (max_val-min_val)/(tf.math.reduce_max(x)-tf.math.reduce_min(x))
            x = factor*(x - tf.math.reduce_min(x))+min_val

        if tf.reduce_max(x)<=1.0 and tf.reduce_min(x)>=-1.0 and tf.reduce_min(x)<0.0:
            if min_val==0.0 and max_val==1.0:
                x = (x+1.0)/2.0
            if min_val==0.0 and max_val==255.0:
                x = (x+1.0)*255.0/2.0

        if tf.reduce_max(x)<=1.0 and tf.reduce_min(x)>=0.0:
            if min_val==-1.0 and max_val==1.0:
                x = (x-0.5)/0.5
            if min_val==0.0 and max_val==255.0:
                x = x*255.0

        return x