Nested `WideDeepModel` fails when saved due to a bug on the optimizer serialization

datumbox commented 1 year ago

Describe the problem.

A model that contains a nested WideDeepModel submodel throws an error when saved. The problem goes back at least since TFv2.9. I've tested the latest nightly and the issue persists on latest master. I've also reproduced this successfully both on Linux and MacOS.

Error message:

  File "./keras/saving/legacy/saving_utils.py", line 211, in model_metadata
    "config": model.optimizer.get_config(),
AttributeError: 'ListWrapper' object has no attribute 'get_config'

Describe the current behavior.

I tried to investigate the source of the issue and I think there are a few problems in the current implementation:

The implementation of model_metadata() doesn't anticipate receiving a list of optimizers.
The model.save(..., include_optimizer=False) parameter is not respected and model_metadata() is called with its default True value.

Describe the expected behavior.

I believe the expected behaviour is:

The model_metadata() can handle lists of optimizers.
The initially declared include_optimizer value should be respected.

Contributing.

Do you want to contribute a PR? (yes/no): yes
If yes, please read this page for instructions
Briefly describe your candidate solution(if contributing):

It's been years since my last contribution to Keras and I'm very rusty with TensorFlow, so take what I say with a grain of salt. I believe that the model_metadata() needs to handle both single optimizers and lists of optimizers. Optionally the original value of include_optimizer should be passed around, so that it's respected. Unfortunately, the latter may require API changes, so perhaps we should find a workaround.

Standalone code to reproduce the issue.

I've added a quick and dirty test on the wide_deep_test.py file:

    def test_config_nested(self):
        input1 = input_layer.Input(shape=(1,))
        output1 = linear.LinearModel()(input1)
        linear_model = keras.Model(input1, output1)

        input2 = input_layer.Input(shape=(1,))
        output2 = core.Dense(units=1)(input2)
        dnn_model = keras.Model(input2, output2)

        wide_deep_model = wide_deep.WideDeepModel(linear_model, dnn_model)
        wide_deep_model.compile(optimizer=['adam', 'adam'])

        output = wide_deep_model([input1, input2])
        model = keras.Model([input1, input2], output)
        model.compile()

        model.save("./deleteme", save_format="tf", include_optimizer=False)

Source code / logs.

Full trace log:

Traceback (most recent call last):
  File "./keras/premade_models/wide_deep_test.py", line 315, in test_config_nested
    model.save("./deleteme", save_format="tf", include_optimizer=False)
  File "./keras/utils/traceback_utils.py", line 61, in error_handler
    return fn(*args, **kwargs)
  File "./keras/engine/training.py", line 2985, in save
    saving_api.save_model(
  File "./keras/saving/saving_api.py", line 163, in save_model
    return legacy_sm_saving_lib.save_model(
  File "./keras/utils/traceback_utils.py", line 61, in error_handler
    return fn(*args, **kwargs)
  File "./keras/saving/legacy/save.py", line 168, in save_model
    saved_model_save.save(
  File "./keras/saving/legacy/saved_model/save.py", line 103, in save
    metadata = generate_keras_metadata(saved_nodes, node_paths)
  File "./keras/saving/legacy/saved_model/save.py", line 132, in generate_keras_metadata
    metadata=node._tracking_metadata,
  File "./keras/engine/base_layer.py", line 3484, in _tracking_metadata
    return self._trackable_saved_model_saver.tracking_metadata
  File "./keras/saving/legacy/saved_model/base_serialization.py", line 54, in tracking_metadata
    return json_utils.Encoder().encode(self.python_properties)
  File "./keras/saving/legacy/saved_model/layer_serialization.py", line 37, in python_properties
    return self._python_properties_internal()
  File "./keras/saving/legacy/saved_model/model_serialization.py", line 41, in _python_properties_internal
    saving_utils.model_metadata(
  File "./keras/saving/legacy/saving_utils.py", line 211, in model_metadata
    "config": model.optimizer.get_config(),
AttributeError: 'ListWrapper' object has no attribute 'get_config'

SuryanarayanaY commented 1 year ago

Hi @datumbox ,

The error is due to passing optimizer as list. As per documentation here the optimizer should be a string like say 'adam' or it should be instance of Optimizer class say tf.keras.optimizers.Adam(). Instead as you are passing as list it causing the error.

Please refer to attached gist for demo .

Please refer the documentation snapshot.

Could you please test with the single optimizer as mentioned and let us know if the test fails. Thankyou!

datumbox commented 1 year ago

@SuryanarayanaY I believe I'm using the model as I am supposed to.

The implementation of WideDeepModel is supposed to support a list of optimizers. One for the Wide and one for the Deep part. Please check this code example from the documentation:

linear_model = LinearModel()
linear_model.compile('adagrad', 'mse')
linear_model.fit(linear_inputs, y, epochs)
dnn_model = keras.Sequential([keras.layers.Dense(units=1)])
dnn_model.compile('rmsprop', 'mse')
dnn_model.fit(dnn_inputs, y, epochs)
combined_model = WideDeepModel(linear_model, dnn_model)
combined_model.compile(optimizer=['sgd', 'adam'],
                       loss='mse', metrics=['mse'])
combined_model.fit([linear_inputs, dnn_inputs], y, epochs)

Does what I'm saying make sense?

SuryanarayanaY commented 1 year ago

Hi @datumbox ,

Thanks for pointing this. For using Multiple Optimizers we need to use tfa.optimizers.MultiOptimizer wrapper from tensorflow add-ons. Please check the documentation here.

I Agree that the documentation code example seems creating some confusion here.Either this is valid bug or needs to be documented properly. Request you to please add a wrapper of MultiOptimzer mentioned above and let us know if the test passes with this change. Thanks!

datumbox commented 1 year ago

@SuryanarayanaY I'm a bit confused on why we should use components from the tensorflow add-ons project. Wouldn't this put an unwanted dependency to Keras and as a result to TF? Also the MultiOptimizer you propose has a completely different API than the one that WideDeepModel seems to intend to support: the first requires explicit definition of the specs while the latter is supposed to handle automatically the Wide and Deep parts directly.

IMO this is a bug because the WideDeepModel is supposed to handle lists of optimizers. This is supported by the additional examples provided on the migration guide and from the code of the class: https://github.com/keras-team/keras/blob/cdb89b47d13fc4b882911eae3a366a395d823275/keras/premade_models/wide_deep.py#L128-L138

Please have a look on our side and let me know if you agree. Thanks!

SuryanarayanaY commented 1 year ago

@datumbox ,

Thanks for pointing this. I can see this model uses custom training step(by overriding the train_step method) and implemented to train with both optimizers. It seems there is some bug in model.save() and it unable to handle the list of optimizers when model.save(...,include_optimizer=True) which is default.

However with model.save(...,include_optimizer=False) there is no error in saving the model. The issue persists at least from TF2.5v to latest tf-nightly as per attached gists.

It needs to dig the issue to more to check the reason for this behaviour. Meanwhile if you want to workout and have a plan please feel free to propose a PR. Thanks!

datumbox commented 1 year ago

It needs to dig the issue to more to check the reason for this behaviour.

That make sense. I'll wait for your feedback.

Meanwhile if you want to workout and have a plan please feel free to propose a PR.

Sounds good, give me some time to think about it.

However with model.save(...,include_optimizer=False) there is no error in saving the model.

I don't think this is true in all cases. As I mentioned above, a model that has a nested WideDeepModel submodel can't be saved even if include_optimizer=False. This is because the value of this flag is ignored.

nkovela1 commented 1 year ago

Hi @datumbox , thanks for reaching out on this issue! I have tested your code with our new saving format (Keras v3), and it seems that our name-based saving fixes the issue. Please confirm on your end:

Replace model.save('./deleteme', save_format='tf', include_optimizer=False) with model.save('./deleteme.keras', save_format='keras_v3')

Thanks! Please let me know if this works.

datumbox commented 1 year ago

@nkovela1 Thanks for the reply. I confirm that model.save("./deleteme.keras", save_format="keras_v3") solves the issue.

I read briefly about this new format and see that now the advised way of exporting for non Python usages is: model.export("./deleteme"). Unfortunately this throws an error:

  File "./keras/premade_models/wide_deep_test.runfiles/org_keras/keras/premade_models/wide_deep_test.py", line 317, in test_config_nested
    model.export("./deleteme")
  File "./keras/premade_models/wide_deep_test.runfiles/org_keras/keras/engine/training.py", line 3602, in export
    export_lib.export_model(self, filepath)
  File "./keras/premade_models/wide_deep_test.runfiles/org_keras/keras/export/export_lib.py", line 409, in export_model
    export_archive.write_out(filepath)
  File "./keras/premade_models/wide_deep_test.runfiles/org_keras/keras/export/export_lib.py", line 338, in write_out
    self._filter_and_track_resources()
  File "./keras/premade_models/wide_deep_test.runfiles/org_keras/keras/export/export_lib.py", line 378, in _filter_and_track_resources
    tvs, ntvs = _list_variables_used_by_fns(fns)
  File "./keras/premade_models/wide_deep_test.runfiles/org_keras/keras/export/export_lib.py", line 561, in _list_variables_used_by_fns
    concrete_functions = [fn.get_concrete_function()]
  File "/opt/miniforge3/envs/tfv2/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py", line 1204, in get_concrete_function
    concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
  File "/opt/miniforge3/envs/tfv2/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py", line 1184, in _get_concrete_function_garbage_collected
    self._initialize(args, kwargs, add_initializers_to=initializers)
  File "/opt/miniforge3/envs/tfv2/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py", line 709, in _initialize
    self._variable_creation_fn    # pylint: disable=protected-access
  File "/opt/miniforge3/envs/tfv2/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py", line 176, in _get_concrete_function_internal_garbage_collected
    concrete_function, _ = self._maybe_define_concrete_function(args, kwargs)
  File "/opt/miniforge3/envs/tfv2/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py", line 171, in _maybe_define_concrete_function
    return self._maybe_define_function(args, kwargs)
  File "/opt/miniforge3/envs/tfv2/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py", line 401, in _maybe_define_function
    concrete_function = self._create_concrete_function(
  File "/opt/miniforge3/envs/tfv2/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py", line 305, in _create_concrete_function
    func_graph_module.func_graph_from_py_func(
  File "/opt/miniforge3/envs/tfv2/lib/python3.10/site-packages/tensorflow/python/framework/func_graph.py", line 1078, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/opt/miniforge3/envs/tfv2/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py", line 613, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "./keras/premade_models/wide_deep_test.runfiles/org_keras/keras/utils/traceback_utils.py", line 61, in error_handler
    return fn(*args, **kwargs)
  File "./keras/premade_models/wide_deep_test.runfiles/org_keras/keras/engine/training.py", line 561, in __call__
    return super().__call__(*args, **kwargs)
  File "./keras/premade_models/wide_deep_test.runfiles/org_keras/keras/utils/traceback_utils.py", line 61, in error_handler
    return fn(*args, **kwargs)
  File "./keras/premade_models/wide_deep_test.runfiles/org_keras/keras/engine/base_layer.py", line 1113, in __call__
    input_spec.assert_input_compatibility(
  File "./keras/premade_models/wide_deep_test.runfiles/org_keras/keras/engine/input_spec.py", line 219, in assert_input_compatibility
    raise ValueError(
ValueError: Layer "model_2" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'args_0:0' shape=(None, 1) dtype=float32>]

It seems there are a few papercuts still with the new API. Do you think it's worth patching the issue reported her until the new API is more mature? Let me know if I can help.

keras-team / tf-keras

Nested `WideDeepModel` fails when saved due to a bug on the optimizer serialization #51