tensorflow / addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
Apache License 2.0
1.69k stars 611 forks source link

Trying to save a model compiled with tfa.MultiOptimizer | gives error saying `TypeError: ('Not JSON Serializable:', ...` #2771

Closed maifeeulasad closed 2 years ago

maifeeulasad commented 2 years ago

System information

Describe the bug

I'm trying to save a model which is developed and compiled with MultiOptimizer provided by tensorflow-addons. But it keeps giving me an error saying:

TypeError: ('Not JSON Serializable:', ...

I tried with different models, environments, and versions.

Code to reproduce the issue Kernel: https://www.kaggle.com/code/maifeeulasad/tfa-multioptimizer-model-save?scriptVersionId=108636090

Other info / logs

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_20/1141714396.py in <module>
     24         batch_size=32,
     25       callbacks=callbacks,
---> 26         validation_data=(valid_xs, valid_ys))

/opt/conda/lib/python3.7/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1228           epoch_logs.update(val_logs)
   1229 
-> 1230         callbacks.on_epoch_end(epoch, epoch_logs)
   1231         training_logs = epoch_logs
   1232         if self.stop_training:

/opt/conda/lib/python3.7/site-packages/keras/callbacks.py in on_epoch_end(self, epoch, logs)
    411     logs = self._process_logs(logs)
    412     for callback in self.callbacks:
--> 413       callback.on_epoch_end(epoch, logs)
    414 
    415   def on_train_batch_begin(self, batch, logs=None):

/tmp/ipykernel_20/3193528328.py in on_epoch_end(self, epoch, logs)
      3         print('name: ' + self.model._name)
      4         self.model.save('epoch-' + str(epoch + 1) + '-' + self.model._name + '.h5',overwrite=True,
----> 5     include_optimizer=True,)
      6 
      7 callbacks = [ModelSaverCallback()]

/opt/conda/lib/python3.7/site-packages/keras/engine/training.py in save(self, filepath, overwrite, include_optimizer, save_format, signatures, options, save_traces)
   2144     # pylint: enable=line-too-long
   2145     save.save_model(self, filepath, overwrite, include_optimizer, save_format,
-> 2146                     signatures, options, save_traces)
   2147 
   2148   def save_weights(self,

/opt/conda/lib/python3.7/site-packages/keras/saving/save.py in save_model(model, filepath, overwrite, include_optimizer, save_format, signatures, options, save_traces)
    144           'or using `save_weights`.')
    145     hdf5_format.save_model_to_hdf5(
--> 146         model, filepath, overwrite, include_optimizer)
    147   else:
    148     with generic_utils.SharedObjectSavingScope():

/opt/conda/lib/python3.7/site-packages/keras/saving/hdf5_format.py in save_model_to_hdf5(model, filepath, overwrite, include_optimizer)
    112       if isinstance(v, (dict, list, tuple)):
    113         f.attrs[k] = json.dumps(
--> 114             v, default=json_utils.get_json_type).encode('utf8')
    115       else:
    116         f.attrs[k] = v

/opt/conda/lib/python3.7/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    236         check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    237         separators=separators, default=default, sort_keys=sort_keys,
--> 238         **kw).encode(obj)
    239 
    240 

/opt/conda/lib/python3.7/json/encoder.py in encode(self, o)
    197         # exceptions aren't as detailed.  The list call should be roughly
    198         # equivalent to the PySequence_Fast that ''.join() would do.
--> 199         chunks = self.iterencode(o, _one_shot=True)
    200         if not isinstance(chunks, (list, tuple)):
    201             chunks = list(chunks)

/opt/conda/lib/python3.7/json/encoder.py in iterencode(self, o, _one_shot)
    255                 self.key_separator, self.item_separator, self.sort_keys,
    256                 self.skipkeys, _one_shot)
--> 257         return _iterencode(o, 0)
    258 
    259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

/opt/conda/lib/python3.7/site-packages/keras/saving/saved_model/json_utils.py in get_json_type(obj)
    140     return obj.value
    141 
--> 142   raise TypeError('Not JSON Serializable:', obj)

TypeError: ('Not JSON Serializable:', <tf.Tensor 'gradient_tape/model_w_multioptimizer/dense_2/MatMul:0' shape=(3, 12) dtype=float32>)

Ref: https://github.com/tensorflow/tensorflow/issues/58184

bhack commented 2 years ago

Can you see if the tests we have introduce with https://github.com/tensorflow/addons/pull/2719 are covering your case?

If not can you extend the test with a PR to cover your case?

Thanks

maifeeulasad commented 2 years ago

@bhack it covers my case. Not sure what is causing this error. Though I need to test with a few different versions of TF and TF-addons.

image

bhack commented 2 years ago

Ok let me know.

maifeeulasad commented 2 years ago

Okay, I was able to reproduce this issue with TF-2.6.4 and TFA-0.14.0. But couldn't reproduce it in the current latest TF-2.10.0 and TFA-0.18.0. I guess somehow upgrading the kernel will resolve my issue or I can simply override that method MultiOptimizer.get_config, as already implemented here: https://github.com/JackWindows/tf-addons/blob/1579bb1938c640fc36b224c3d903bfd418a3d40e/tensorflow_addons/optimizers/discriminative_layer_training.py#L146-L154.

Thanks @bhack

Log of reproduction ```shell maifee@MUA-HP:~/addons/tensorflow_addons/optimizers/tests$ ls discriminative_layer_training_test.py | entr python3 -m pytest -s discriminative_layer_training_test.py 2022-10-21 12:26:03.618127: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2022-10-21 12:26:03.618573: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2022-10-21 12:26:07.041521: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2022-10-21 12:26:07.041960: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) 2022-10-21 12:26:07.042303: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (MUA-HP): /proc/driver/nvidia/version does not exist ================================================= test session starts ================================================== platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0 rootdir: /home/maifee/addons, configfile: pytest.ini plugins: typeguard-2.13.3 collected 1 item discriminative_layer_training_test.py 2022-10-21 12:26:07.088950: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2.6.4 0.14.0 2022-10-21 12:26:07.261232: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) Epoch 1/2 4/4 [==============================] - 1s 80ms/step - loss: 15.4249 - accuracy: 0.0000e+00 - val_loss: 15.4249 - val_accuracy: 0.0000e+00 BEFORE name: some-model F ======================================================= FAILURES ======================================================= ________________________________________ test_serialization_after_training[cpu] ________________________________________ tmpdir = local('/tmp/pytest-of-maifee/pytest-30/test_serialization_after_train0') def test_serialization_after_training(tmpdir): print(tf.__version__) print(tfa.__version__) class ModelSaverCallback(tf.keras.callbacks.Callback): def on_epoch_end(self, epoch, logs=None): print() print('BEFORE name: ' + self.model._name) self.model.save(tmpdir + 'epoch-' + str(epoch + 1) + '-' + self.model._name + '.h5',overwrite=True,include_optimizer=True,) print('AFTER name: ' + self.model._name) callbacks = [ModelSaverCallback()] x = np.array(np.ones([100,3])) y = np.array(np.ones([100])) model = tf.keras.Sequential([ tf.keras.layers.Dense(2, activation = 'relu', input_shape=(len(x[0]),)), tf.keras.layers.Dense(1, activation = 'tanh') ]) model._name = "some-model" opt1 = tf.keras.optimizers.SGD(learning_rate=3e-4) opt2 = tf.keras.optimizers.Adam(learning_rate=3e-4) opt_layer_pairs = [(opt1, model.layers[:1]), (opt2, model.layers[1:])] optimizer = MultiOptimizer(opt_layer_pairs) # Train the model for a few epochs. model.compile(optimizer = optimizer, loss = tf.keras.losses.BinaryCrossentropy(), metrics=['accuracy']) > model.fit(x, y, epochs = 2, batch_size=32, callbacks=callbacks, validation_data=(x, y)) discriminative_layer_training_test.py:313: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../../../.local/lib/python3.8/site-packages/keras/engine/training.py:1230: in fit callbacks.on_epoch_end(epoch, epoch_logs) ../../../../.local/lib/python3.8/site-packages/keras/callbacks.py:413: in on_epoch_end callback.on_epoch_end(epoch, logs) discriminative_layer_training_test.py:284: in on_epoch_end self.model.save(tmpdir + 'epoch-' + str(epoch + 1) + '-' + self.model._name + '.h5',overwrite=True,include_optimizer=True,) ../../../../.local/lib/python3.8/site-packages/keras/engine/training.py:2145: in save save.save_model(self, filepath, overwrite, include_optimizer, save_format, ../../../../.local/lib/python3.8/site-packages/keras/saving/save.py:145: in save_model hdf5_format.save_model_to_hdf5( ../../../../.local/lib/python3.8/site-packages/keras/saving/hdf5_format.py:113: in save_model_to_hdf5 f.attrs[k] = json.dumps( /usr/lib/python3.8/json/__init__.py:234: in dumps return cls( /usr/lib/python3.8/json/encoder.py:199: in encode chunks = self.iterencode(o, _one_shot=True) /usr/lib/python3.8/json/encoder.py:257: in iterencode return _iterencode(o, 0) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = def get_json_type(obj): """Serializes any object to a JSON-serializable structure. Args: obj: the object to serialize Returns: JSON-serializable structure representing `obj`. Raises: TypeError: if `obj` cannot be serialized. """ # if obj is a serializable Keras class instance # e.g. optimizer, layer if hasattr(obj, 'get_config'): return {'class_name': obj.__class__.__name__, 'config': obj.get_config()} # if obj is any numpy type if type(obj).__module__ == np.__name__: if isinstance(obj, np.ndarray): return obj.tolist() else: return obj.item() # misc functions (e.g. loss function) if callable(obj): return obj.__name__ # if obj is a python 'type' if type(obj).__name__ == type.__name__: return obj.__name__ if isinstance(obj, tf.compat.v1.Dimension): return obj.value if isinstance(obj, tf.TensorShape): return obj.as_list() if isinstance(obj, tf.DType): return obj.name if isinstance(obj, collections.abc.Mapping): return dict(obj) if obj is Ellipsis: return {'class_name': '__ellipsis__'} if isinstance(obj, wrapt.ObjectProxy): return obj.__wrapped__ if isinstance(obj, tf.TypeSpec): try: type_spec_name = type_spec.get_name(type(obj)) return {'class_name': 'TypeSpec', 'type_spec': type_spec_name, 'serialized': obj._serialize()} # pylint: disable=protected-access except ValueError: raise ValueError('Unable to serialize {} to JSON, because the TypeSpec ' 'class {} has not been registered.' .format(obj, type(obj))) if isinstance(obj, enum.Enum): return obj.value > raise TypeError('Not JSON Serializable:', obj) E TypeError: ('Not JSON Serializable:', ) ../../../../.local/lib/python3.8/site-packages/keras/saving/saved_model/json_utils.py:142: TypeError =================================================== warnings summary =================================================== ../../../../.local/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py:22 /home/maifee/.local/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html =============================================== short test summary info ================================================ FAILED discriminative_layer_training_test.py::test_serialization_after_training[cpu] - TypeError: ('Not JSON Serializ... ============================================= 1 failed, 1 warning in 2.00s ============================================= ```
suedez commented 1 year ago

2719

I tried on TF-2.6.2, rewrited the discriminative_layer_trainning.py file as it mentioned here: https://github.com/JackWindows/tf-addons/blob/1579bb1938c640fc36b224c3d903bfd418a3d40e/tensorflow_addons/optimizers/discriminative_layer_training.py#L146-L154. But there still raised 'Not JSON Serializable' error. I'm wondering if there is anything I can do with TF-2.6.2 to solve this issue? Since TF-2.6 is required for another package in my task, I can only stick on it. P.S. everything would go well if I chose not to save optimizer status, otherwise it failed.