Open znaeem opened 3 years ago
Thanks for the question! I have some workaround below but I agree that the ergonomics here aren't great. I'll investigate further and see if this can be made better.
The reason saving isn't working here is that the train/test step uses a retrieval metric defined over a dataset of movies. This isn't really saveable to SavedModel
, since the test dataset is not included in the resulting export file.
Here are a couple of suggestions on how to deal with this:
ModelCheckpoint
callback, or .save_weights/.load_weights
. Because this saves only the weights of the model, it doesn't run afoul of the problem.model.save
for training, you need to unset the metrics from self.retrieval_task
before saving. For example, train the model as before, but call the following before calling model.save
:model.retrieval_task = tfrs.tasks.Retrieval() # Removes the metrics.
model.compile()
@maciejkula can we serve this saved model with TFServing similar to the BruteForce model? Same signature? Not at my keyboard now, but I can verify when I’m back.
@maciejkula Thank you for the quick response, I will try the steps you have outlined.
After saving, I seeing the signature below using saved_model_cli (along with an error, but ignoring this for now).
Note: I'm looking for a similar signature to the BruteForce model where i can pass in a user_id
and receive a list of movie_title
s using TF Serving. Is this possible with anything beyond the BruteForce approach?
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['__saved_model_init_op']:
The given SavedModel SignatureDef contains the following input(s):
The given SavedModel SignatureDef contains the following output(s):
outputs['__saved_model_init_op'] tensor_info:
dtype: DT_INVALID
shape: unknown_rank
name: NoOp
Method name is:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['movie_title'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: serving_default_movie_title:0
inputs['user_id'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: serving_default_user_id:0
inputs['user_rating'] tensor_info:
dtype: DT_FLOAT
shape: (-1)
name: serving_default_user_rating:0
The given SavedModel SignatureDef contains the following output(s):
outputs['output_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 32)
name: StatefulPartitionedCall:0
outputs['output_2'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 32)
name: StatefulPartitionedCall:1
outputs['output_3'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: StatefulPartitionedCall:2
Method name is: tensorflow/serving/predict
2020-11-02 15:50:11.810779: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-11-02 15:50:11.810823: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-11-02 15:50:11.810858: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (datascience-1-0-ml-t3-medium-1abf3407f667f989be9d86559395): /proc/driver/nvidia/version does not exist
Traceback (most recent call last):
File "/opt/conda/bin/saved_model_cli", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_cli.py", line 1185, in main
args.func(args)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_cli.py", line 715, in show
_show_all(args.dir)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_cli.py", line 307, in _show_all
_show_defined_functions(saved_model_dir)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_cli.py", line 187, in _show_defined_functions
trackable_object = load.load(saved_model_dir)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py", line 603, in load
return load_internal(export_dir, tags, options)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py", line 633, in load_internal
ckpt_options)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py", line 131, in __init__
self._restore_checkpoint()
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py", line 330, in _restore_checkpoint
load_status = saver.restore(variables_path, self._checkpoint_options)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py", line 1320, in restore
checkpoint=checkpoint, proto_id=0).restore(self._graph_view.root)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 209, in restore
restore_ops = trackable._restore_from_checkpoint_position(self) # pylint: disable=protected-access
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 914, in _restore_from_checkpoint_position
tensor_saveables, python_saveables))
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py", line 290, in restore_saveables
tensor_saveables)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 361, in validate_and_slice_inputs
_add_saveable(saveables, seen_ops, converted_saveable_object)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 331, in _add_saveable
saveable.name)
ValueError: The same saveable will be restored with two names: user_model/layer_with_weights-0/_table/.ATTRIBUTES/table
@cfregly if you'd like to export a brute-force based model, you'll need to pick out the retrieval model subcomponents.
# Create a model that takes in raw query features, and
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
# recommends movies out of the entire movies dataset.
index.index(movies.batch(100).map(model.movie_model), movies)
# Get recommendations.
_, titles = index(tf.constant(["42"]))
# Save as before.
index.save(...)
In doing so you'll have a model that's trained jointly but served individually.
Nothing beyond brute force yet, but this will change soon.
@maciejkula I have followed the steps above to try to export the brute force based model, but I get the following issue:
ValueError: Got non-flat/non-unique argument names for SavedModel signature 'serving_default': more than one argument to '__inference_signature_wrapper_5077' was named 'customer id'. Signatures have one Tensor per named input, so to have predictable names Python functions used to generate these signatures should avoid *args and Tensors in nested structures unless unique names are specified for each. Use tf.TensorSpec(..., name=...) to provide a name for a Tensor input.
I am not sure what the error means, but I think it has something to do with input names clashing. My model uses the prepossessing layers that use .adapt() to the columns of the training_dataset that are used in the functional models for query and candidate before they are used in their own model classes (like in this guide for QueryModel and CandidateModel).
I suspect that you're passing a nested dict of tensors into your function, and it has the customer_id
key repeated more than once. When saving models, these are flattened, and so having the same key twice will cause the saving to fail.
I have the same issue on try to save Bruteforce model based, someone have a workaround?
I was following the guide for the multi-task recommender found here, but when I tried to save using model.save(), I was unable to to do so with the following error:
I also cannot save it in HDF5 format, but I believe that's because the model in question is a custom subclassing of the model class. What is the appropriate way to save the model?