keras.layers.IntegerLookup fails to deserialize vocubulary

jeisinge commented 3 years ago

See https://github.com/tensorflow/tensorflow/issues/52325 .

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 2.6
Python version: 3.7

Describe the current behavior When creating a keras.layers.IntegerLookup, we provide a vocabulary. This is saved off via serialization. However, upon deserialization, the vocabulary is not loaded correctly --- the layer continues to work, however, one cannot serialize again.

Describe the expected behavior We should be able to serialize and deserialize IntegerLookups any number of times.

Contributing

Do you want to contribute a PR? (yes/no): no

Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.

To reproduce, see https://colab.research.google.com/drive/1tpXdEsfYKyax5QhYLR7KAV4P2vQHIT8D?usp=sharing . It is interesting to note that I cannot reproduce by just serializing to JSON --- I have to serialize to SavedModel.

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

From looking at the code, https://github.com/keras-team/keras/blob/v2.6.0/keras/layers/preprocessing/index_lookup.py#L200 , it appears that we are setting vocabulary, but not input_vocabulary. Since vocabulary is set, the layer works fine. But, upon the next serialization, we do serialize an empty vocab: https://github.com/keras-team/keras/blob/v2.6.0/keras/layers/preprocessing/index_lookup.py#L333 .

It is not clear to me the difference between input_vocabulary and vocabulary.

This is also related to https://github.com/tensorflow/tensorflow/issues/43834

tomerk commented 3 years ago

@mattdangerw I think this is a preprocessing layer? Could you take a quick look?

cadama commented 2 years ago

I've also run into this problem. Any temporary patch for this?

mattdangerw commented 2 years ago

Took a look. We currently don't support saving -> loading -> cloning as path with the lookup layers because of saving vocabulary files. In this case, we don't want to leak the old config file name to new models which might be restored in a new environment where the vocab file name is not available.

I think there's a few we should probably do here:

1) We could support the saving -> loading -> cloning workflow for non-file vocabularies. 2) Ideally we can error out helpfully when attempting to get run get_config on a lookup layer with a file vocabulary that has been saved and restored. 3) Lastly, an "everything works" solution would involve updating any vocabulary file paths in a lookup layer config to the saved model asset path for the vocab file on save/load. I'm not sure we have all the APIs we would need to support this today.

I'm not sure if 3) will be immediately actionable, but 1) and 2) should be things we could start on.

jeisinge commented 2 years ago

I don't believe there is a vocab file. It's a python list.

mattdangerw commented 2 years ago

Yeah, in the colab you posted, 1) would be a sufficient fix.

jeisinge commented 2 years ago

Oops --- I completely misread your message. It all makes sense. I apologize for my errant comment.

SuryanarayanaY commented 1 year ago

Hello, Thank you for reporting an issue.

We're currently in the process of migrating the new Keras 3 code base from keras-team/keras-core to keras-team/keras. Consequently, This issue may not be relevant to the Keras 3 code base . After the migration is successfully completed, feel free to reopen this Issue at keras-team/keras if you believe it remains relevant to the Keras 3 code base. If instead this Issue is a bug or security issue in legacy tf.keras, you can instead report a new issue at keras-team/tf-keras, which hosts the TensorFlow-only, legacy version of Keras.

To know more about Keras 3, please read https://keras.io/keras_core/announcement/

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No

jeisinge commented 1 year ago

It's still an issue. See originally linked notebook.

Successfully installed keras-2.14.0 tensorboard-2.14.0 tensorflow-2.14.0rc1 tensorflow-estimator-2.14.0 wrapt-1.14.1

jeisinge commented 1 year ago

After upgrading to keras_core, it appears to be working. We can close this one out.

keras-team / keras

keras.layers.IntegerLookup fails to deserialize vocubulary #15508