Closed jeisinge closed 1 year ago
@mattdangerw I think this is a preprocessing layer? Could you take a quick look?
I've also run into this problem. Any temporary patch for this?
Took a look. We currently don't support saving -> loading -> cloning
as path with the lookup layers because of saving vocabulary files. In this case, we don't want to leak the old config file name to new models which might be restored in a new environment where the vocab file name is not available.
I think there's a few we should probably do here:
1) We could support the saving -> loading -> cloning
workflow for non-file vocabularies.
2) Ideally we can error out helpfully when attempting to get run get_config
on a lookup layer with a file vocabulary that has been saved and restored.
3) Lastly, an "everything works" solution would involve updating any vocabulary file paths in a lookup layer config to the saved model asset path for the vocab file on save/load. I'm not sure we have all the APIs we would need to support this today.
I'm not sure if 3) will be immediately actionable, but 1) and 2) should be things we could start on.
I don't believe there is a vocab file. It's a python list.
Yeah, in the colab you posted, 1) would be a sufficient fix.
Oops --- I completely misread your message. It all makes sense. I apologize for my errant comment.
Hello, Thank you for reporting an issue.
We're currently in the process of migrating the new Keras 3 code base from keras-team/keras-core to keras-team/keras. Consequently, This issue may not be relevant to the Keras 3 code base . After the migration is successfully completed, feel free to reopen this Issue at keras-team/keras if you believe it remains relevant to the Keras 3 code base. If instead this Issue is a bug or security issue in legacy tf.keras, you can instead report a new issue at keras-team/tf-keras, which hosts the TensorFlow-only, legacy version of Keras.
To know more about Keras 3, please read https://keras.io/keras_core/announcement/
It's still an issue. See originally linked notebook.
Successfully installed keras-2.14.0 tensorboard-2.14.0 tensorflow-2.14.0rc1 tensorflow-estimator-2.14.0 wrapt-1.14.1
After upgrading to keras_core
, it appears to be working. We can close this one out.
See https://github.com/tensorflow/tensorflow/issues/52325 .
System information
Describe the current behavior When creating a keras.layers.IntegerLookup, we provide a vocabulary. This is saved off via serialization. However, upon deserialization, the vocabulary is not loaded correctly --- the layer continues to work, however, one cannot serialize again.
Describe the expected behavior We should be able to serialize and deserialize IntegerLookups any number of times.
Contributing
Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.
To reproduce, see https://colab.research.google.com/drive/1tpXdEsfYKyax5QhYLR7KAV4P2vQHIT8D?usp=sharing . It is interesting to note that I cannot reproduce by just serializing to JSON --- I have to serialize to SavedModel.
Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
From looking at the code, https://github.com/keras-team/keras/blob/v2.6.0/keras/layers/preprocessing/index_lookup.py#L200 , it appears that we are setting
vocabulary
, but notinput_vocabulary
. Sincevocabulary
is set, the layer works fine. But, upon the next serialization, we do serialize an empty vocab: https://github.com/keras-team/keras/blob/v2.6.0/keras/layers/preprocessing/index_lookup.py#L333 .It is not clear to me the difference between
input_vocabulary
andvocabulary
.This is also related to https://github.com/tensorflow/tensorflow/issues/43834