Open hmeine opened 5 years ago
It would indeed be great if there would be a way to serve user-defined (key/value) meta data. I asked the community about this at https://stackoverflow.com/questions/54114525/how-to-serve-custom-meta-data-for-a-tensorflow-model, but have gotten no answer yet.
Currently, only signatures are supported as metadata. Can you store the metadata in the SaveModel?
Yes, I think we can store the metadata in the SavedModel (see my last sentence above), but that did not seem to help very much with serving it?
Sorry for the delay.
Unfortunately adding logic to allow serving metadata other than signaturedefs is not on the roadmap right now and I'm not sure we have a good understanding of the general use case for which supporting this would make sense.
Regarding how to serve the metadata stored in the saved model, presumably, you'd add a constant to your graph holding the tensor value of interest (the input/output shape), create a new signature [1] and do inference with that signature -- I've never seen this done but I can't imagine why it wouldn't work.
[1]https://www.tensorflow.org/guide/saved_model#manually_build_a_savedmodel
@unclepeddy, the use case for that is serving some closely related to the model metadata. Say, the name of datasets it was trained on, or the name of class ontology that was used during the training.
If one can't store this information in the metadata of the model itself, he should create a table in his DB for storing {version: metadata}
information and be pretty disciplined with how he updates the models. Having one source of truth is always better.
As for creating a constant tensor for serving this data... It's super hacky, tbh.
@martinz is this a use case that ml metadata could be useful for?
What is "ml metadata"? (It sounds potentially relevant, indeed.)
@unclepeddy, the use case for that is serving some closely related to the model metadata. Say, the name of datasets it was trained on, or the name of class ontology that was used during the training.
Indeed, that is useful metadata which we also store. However, let me rephrase the use case from my initial issue description a little more specifically:
Consider the popular U-net architecture for segmentation. If you look at the original Ronneberger/Fischer/Brox publication, you see that the CNN is fully convolutional and takes a 572² image, producing a 388² segmentation map. That means that in order to use models like this, assuming they're served by TFS, one needs to know that one has to pad the desired output region with 92 pixels along all borders (in order to account for the difference; 92+388+92 = 572).
I would like to serve metadata such as a JSON object that encodes these architecture properties:
{ "padding" : [92, 92], "minimum_output_size" : [4, 4], "size_offset" : [16, 16] }
@hmeine, have you read about TF Transform?
@hmeine Thanks for the concrete use case - it makes sense and if you don't mind I'd like to understand the set up a little deeper to validate whether serving custom metadata is the correct solution here.
In general, there are 2 ways clients interact with models on TF Serving: human in the loop, and programmatically.
Please do let me know your thoughts and if I'm missing something about your use case.
@unclepeddy Regarding the second case: I don't think the client usually needs to call this endpoint before every inference.
This is the workflow as I envision it:
Obviously there is the possibility that the model is removed or updated while between two inference requests. However I don't consider this a big issue:
If the model is removed: The problem is the same problem as without meta-data - the model is gone, the request will fail. So nothing new here.
If a new model version is uploaded: This is only a new problem if all of the following apply:
I don't see any urgent need to actively send configs to clients when a new model is uploaded.
@unclepeddy, making two requests instead of one is not so bad option. The metadata request is really light compared to neural networks computations. It adds some latency, which can be ignored. There is still a problem that a model can be changed between these two requests, but it will happen very rarely and it's ok for a startup.
Secondly, regarding a config distribution mechanism. Usually, startups don't have infrastructure like Google or Facebook. Secondly, even if they have, it still make sense to distribute your models along with metadata. Decoupling is bad and it creates a headache.
Assume the following situation. You added a model A
to a serving at path path/to/serving/directory/5/
and updated your config that is globally distributed. Then you decided that this model is bad, deleted it and uploaded another one in the same path, however you forgot to update the config.
Everything will be valid, however, your metadata will be wrong.
@grebdioZ answered in my sense. In particular, the extra request is only needed once (per model, not per subsequent tile request).
Again, when thinking about this, it makes sense not to consider image classification, but pixel classification models (e.g. "semantic segmentation" tasks), which can be applied to arbitrarily large inputs. @Jihadik's pointer to TF transform does not apply here, since in this case the "padding" needs to come from the original image and cannot be added by the model itself.
W.r.t. the discussions of model changes, I would envision not using "latest" requests (but to initially determine the latest available model), which should prevent any problems.
@hmeine, regarding TF transform, there are some slicing into patches operations, which might take arbitrary images and slice them into fixed-size patches.
@hmeine, regarding TF transform, there are some slicing into patches operations, which might take arbitrary images and slice them into fixed-size patches.
Without looking at the API, that would still require sending the full image. We have many use cases where one wants to perform local inference. Again, that's interesting info, but not tackling our issue.
I would like to chime in. It will be great if Saved Models could have an analogy of TFLite metadata https://tensorflow.google.cn/lite/convert/metadata. By the way, TFLite models are FlatBuffer format.
Describe the problem the feature is intended to solve
We would like to serve fully-convolutional segmentation models whose input and output tensor sizes are flexible, but not identical. In this setting, the model can only be used together with additional information, such as the necessary input padding per dimension. We would like to be able to serve this metadata together with the model, so that it becomes easy to switch between models, you don't have to know this metadata, and all information is in one place.
Describe the solution
There's already a metadata API, which according to the serving_basic documentation supports metadata "such as" signatures:
This means that it should be possible to query other metadata than the signatures, and the API seems to allow that, but I could not find out how to do so. It would be nice if there was an example or some documentation snippet on this.
Describe alternatives you've considered