mlflow / mlflow

Open source platform for the machine learning lifecycle
https://mlflow.org
Apache License 2.0
18.6k stars 4.22k forks source link

[BUG] Using a signature in a Tensorflow model throws an exception #4223

Closed saschaschramm closed 3 years ago

saschaschramm commented 3 years ago

Thank you for submitting an issue. Please refer to our issue policy for additional information about bug reports. For help with debugging your code, please refer to Stack Overflow.

Please fill in this bug report template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

System information

Describe the problem

Actual behavior:

When a TensorFlow model with a signature is serialized and used for a prediction the MLflow predict method throws an exception.

Model with signature being saved (this works):

@tf.function(input_signature=[tf.TensorSpec(shape=[1, 1], dtype=tf.float32, name='input')])
        def serve_fn(example):
            outputs = model(example)
            return outputs

tf.saved_model.save(model, TF_MODEL_DIR,
                            signatures={'serving_default': serve_fn.get_concrete_function()})

mlflow.tensorflow.log_model(
                tf_saved_model_dir=TF_MODEL_DIR,
                tf_meta_graph_tags=['serve'],
                tf_signature_def_key='serving_default',
                artifact_path='mlflow_model'
            )

This throws an exception:

mlflow_model = mlflow.pyfunc.load_model(model_uri)
prediction = mlflow_model.predict(feed_dict)

Interestingly an exception is not thrown when the TF2Wrapper is instantiated directly:

loaded_model = tf.saved_model.load(
                export_dir=f'mlruns/0/{run.info.run_id}/artifacts/mlflow_model/tfmodel',
                tags=['serve']
            )
tf2_wrapper = _TF2Wrapper(infer=loaded_model.signatures['serving_default'])
prediction = tf2_wrapper.predict(feed_dict)

Expected behavior: mlflow_model.predict(feed_dict) shouldn't throw an exception when using a signature.

Code to reproduce issue

import unittest

import mlflow
import numpy as np
import tensorflow as tf
from mlflow.tensorflow import _TF2Wrapper

class TestTfModelPredict(unittest.TestCase):

    def test_tf_model_predict(self):
        tf.random.set_seed(1337)
        TF_MODEL_DIR = "model"

        # Build model
        inputs = tf.keras.layers.Input(shape=1, name='input', dtype=tf.float32)
        outputs = tf.keras.layers.Dense(1)(inputs)
        model = tf.keras.Model(inputs=inputs, outputs=[outputs])

        # Compile model
        model.compile(
            optimizer='rmsprop',
            loss='categorical_crossentropy',
            metrics=['accuracy'])

        # Serving signature
        @tf.function(input_signature=[tf.TensorSpec(shape=[1, 1], dtype=tf.float32, name='input')])
        def serve_fn(example):
            outputs = model(example)
            return outputs

        # Save model with serving signature
        tf.saved_model.save(model, TF_MODEL_DIR,
                            signatures={'serving_default': serve_fn.get_concrete_function()})

        feed_dict = {"input": tf.constant([[2.0]])}
        loaded_model = tf.saved_model.load(TF_MODEL_DIR)

        infer = loaded_model.signatures['serving_default']
        prediction = infer(**feed_dict)

        # This works as expected
        np.testing.assert_allclose(prediction['output_0'], np.asarray([[-0.09599352]]))

        # Save as mlflow model
        with mlflow.start_run() as run:
            mlflow.tensorflow.log_model(
                tf_saved_model_dir=TF_MODEL_DIR,
                tf_meta_graph_tags=['serve'],
                tf_signature_def_key='serving_default',
                artifact_path='mlflow_model'
            )

            loaded_model = tf.saved_model.load(
                export_dir=f'mlruns/0/{run.info.run_id}/artifacts/mlflow_model/tfmodel',
                tags=['serve']
            )

            tf2_wrapper = _TF2Wrapper(infer=loaded_model.signatures['serving_default'])
            prediction = tf2_wrapper.predict(feed_dict)

            # This works as expected
            np.testing.assert_allclose(prediction['output_0'], np.asarray([-0.09599352]))
            model_uri = f'mlruns/0/{run.info.run_id}/artifacts/mlflow_model'
            mlflow_model = mlflow.pyfunc.load_model(model_uri)

            # This doesn't work (!!!)
            prediction = mlflow_model.predict(feed_dict)
            np.testing.assert_allclose(prediction['output_0'], np.asarray([-0.09599352]))

if __name__ == '__main__':
    unittest.main()

Other info / logs

Error Traceback (most recent call last): File "/usr/local/anaconda3/envs/test-mlflow2/lib/python3.8/unittest/case.py", line 60, in testPartExecutor yield File "/usr/local/anaconda3/envs/test-mlflow2/lib/python3.8/unittest/case.py", line 676, in run self._callTestMethod(testMethod) File "/usr/local/anaconda3/envs/test-mlflow2/lib/python3.8/unittest/case.py", line 633, in _callTestMethod method() File "/Users/schramm/PycharmProjects/kiwi/test-mlflow2/tensorflow/test.py", line 62, in test_tf_model_predict prediction = mlflow_model.predict(feed_dict) File "/usr/local/anaconda3/envs/test-mlflow2/lib/python3.8/site-packages/mlflow/pyfunc/init.py", line 575, in predict return self._model_impl.predict(data) File "/usr/local/anaconda3/envs/test-mlflow2/lib/python3.8/site-packages/mlflow/tensorflow.py", line 582, in predict raw_preds = self.infer(**feed_dict) File "/usr/local/anaconda3/envs/test-mlflow2/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1669, in call return self._call_impl(args, kwargs) File "/usr/local/anaconda3/envs/test-mlflow2/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1678, in _call_impl return self._call_with_structured_signature(args, kwargs, File "/usr/local/anaconda3/envs/test-mlflow2/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1759, in _call_with_structured_signature return self._call_flat( File "/usr/local/anaconda3/envs/test-mlflow2/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 115, in _call_flat return super(_WrapperFunction, self)._call_flat(args, captured_inputs, File "/usr/local/anaconda3/envs/test-mlflow2/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1918, in _call_flat return self._build_call_outputs(self._inference_function.call( File "/usr/local/anaconda3/envs/test-mlflow2/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 555, in call outputs = execute.execute( File "/usr/local/anaconda3/envs/test-mlflow2/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable _AnonymousVar15 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar15/N10tensorflow3VarE does not exist. [[{{node StatefulPartitionedCall/model/dense/BiasAdd/ReadVariableOp}}]] [Op:__inference_signature_wrapper_932]

Function call stack: signature_wrapper

What component(s), interfaces, languages, and integrations does this bug affect?

Components

Interface

Language

Integrations

dbczumar commented 3 years ago

Thanks for your fix @saschaschramm ! I'm going to go ahead and close this out.