tensorflow / serving

A flexible, high-performance serving system for machine learning models
https://www.tensorflow.org/serving
Apache License 2.0
6.18k stars 2.19k forks source link

Support serving of custom model metadata #1248

Open hmeine opened 5 years ago

hmeine commented 5 years ago

Describe the problem the feature is intended to solve

We would like to serve fully-convolutional segmentation models whose input and output tensor sizes are flexible, but not identical. In this setting, the model can only be used together with additional information, such as the necessary input padding per dimension. We would like to be able to serve this metadata together with the model, so that it becomes easy to switch between models, you don't have to know this metadata, and all information is in one place.

Describe the solution

There's already a metadata API, which according to the serving_basic documentation supports metadata "such as" signatures:

saved_model.pb is the serialized tensorflow::SavedModel. It includes one or more graph definitions of the model, as well as metadata of the model such as signatures.

This means that it should be possible to query other metadata than the signatures, and the API seems to allow that, but I could not find out how to do so. It would be nice if there was an example or some documentation snippet on this.

Describe alternatives you've considered

grebdioZ commented 5 years ago

It would indeed be great if there would be a way to serve user-defined (key/value) meta data. I asked the community about this at https://stackoverflow.com/questions/54114525/how-to-serve-custom-meta-data-for-a-tensorflow-model, but have gotten no answer yet.

lilao commented 5 years ago

Currently, only signatures are supported as metadata. Can you store the metadata in the SaveModel?

hmeine commented 5 years ago

Yes, I think we can store the metadata in the SavedModel (see my last sentence above), but that did not seem to help very much with serving it?

misterpeddy commented 5 years ago

Sorry for the delay.

Unfortunately adding logic to allow serving metadata other than signaturedefs is not on the roadmap right now and I'm not sure we have a good understanding of the general use case for which supporting this would make sense.

Regarding how to serve the metadata stored in the saved model, presumably, you'd add a constant to your graph holding the tensor value of interest (the input/output shape), create a new signature [1] and do inference with that signature -- I've never seen this done but I can't imagine why it wouldn't work.

[1]https://www.tensorflow.org/guide/saved_model#manually_build_a_savedmodel

al-muammar commented 5 years ago

@unclepeddy, the use case for that is serving some closely related to the model metadata. Say, the name of datasets it was trained on, or the name of class ontology that was used during the training.

If one can't store this information in the metadata of the model itself, he should create a table in his DB for storing {version: metadata} information and be pretty disciplined with how he updates the models. Having one source of truth is always better.

As for creating a constant tensor for serving this data... It's super hacky, tbh.

misterpeddy commented 5 years ago

@martinz is this a use case that ml metadata could be useful for?

hmeine commented 5 years ago

What is "ml metadata"? (It sounds potentially relevant, indeed.)

hmeine commented 5 years ago

@unclepeddy, the use case for that is serving some closely related to the model metadata. Say, the name of datasets it was trained on, or the name of class ontology that was used during the training.

Indeed, that is useful metadata which we also store. However, let me rephrase the use case from my initial issue description a little more specifically:

Consider the popular U-net architecture for segmentation. If you look at the original Ronneberger/Fischer/Brox publication, you see that the CNN is fully convolutional and takes a 572² image, producing a 388² segmentation map. That means that in order to use models like this, assuming they're served by TFS, one needs to know that one has to pad the desired output region with 92 pixels along all borders (in order to account for the difference; 92+388+92 = 572).

I would like to serve metadata such as a JSON object that encodes these architecture properties:

{ "padding" : [92, 92], "minimum_output_size" : [4, 4], "size_offset" : [16, 16] }
al-muammar commented 5 years ago

@hmeine, have you read about TF Transform?

misterpeddy commented 5 years ago

@hmeine Thanks for the concrete use case - it makes sense and if you don't mind I'd like to understand the set up a little deeper to validate whether serving custom metadata is the correct solution here.

In general, there are 2 ways clients interact with models on TF Serving: human in the loop, and programmatically.

Please do let me know your thoughts and if I'm missing something about your use case.

grebdioZ commented 5 years ago

@unclepeddy Regarding the second case: I don't think the client usually needs to call this endpoint before every inference.

This is the workflow as I envision it:

  1. A user (whether human or programmatic) asks the server which models are available and receives meta-information which allows him to check for suitable ones, e.g. "RGB car detector", and picks one.
  2. The programmatic client queries the selected model for the meta data which include the settings @hmeine mentioned (padding etc), so that he can format (all) his future inference requests in the correct format.
  3. The client sends a bunch of consecutive inference requests and just assumes the model is not removed while he is doing that.

Obviously there is the possibility that the model is removed or updated while between two inference requests. However I don't consider this a big issue:

If the model is removed: The problem is the same problem as without meta-data - the model is gone, the request will fail. So nothing new here.

If a new model version is uploaded: This is only a new problem if all of the following apply:

I don't see any urgent need to actively send configs to clients when a new model is uploaded.

al-muammar commented 5 years ago

@unclepeddy, making two requests instead of one is not so bad option. The metadata request is really light compared to neural networks computations. It adds some latency, which can be ignored. There is still a problem that a model can be changed between these two requests, but it will happen very rarely and it's ok for a startup.

Secondly, regarding a config distribution mechanism. Usually, startups don't have infrastructure like Google or Facebook. Secondly, even if they have, it still make sense to distribute your models along with metadata. Decoupling is bad and it creates a headache.

Assume the following situation. You added a model A to a serving at path path/to/serving/directory/5/ and updated your config that is globally distributed. Then you decided that this model is bad, deleted it and uploaded another one in the same path, however you forgot to update the config.

Everything will be valid, however, your metadata will be wrong.

hmeine commented 5 years ago

@grebdioZ answered in my sense. In particular, the extra request is only needed once (per model, not per subsequent tile request).

Again, when thinking about this, it makes sense not to consider image classification, but pixel classification models (e.g. "semantic segmentation" tasks), which can be applied to arbitrarily large inputs. @Jihadik's pointer to TF transform does not apply here, since in this case the "padding" needs to come from the original image and cannot be added by the model itself.

W.r.t. the discussions of model changes, I would envision not using "latest" requests (but to initially determine the latest available model), which should prevent any problems.

al-muammar commented 5 years ago

@hmeine, regarding TF transform, there are some slicing into patches operations, which might take arbitrary images and slice them into fixed-size patches.

hmeine commented 5 years ago

@hmeine, regarding TF transform, there are some slicing into patches operations, which might take arbitrary images and slice them into fixed-size patches.

Without looking at the API, that would still require sending the full image. We have many use cases where one wants to perform local inference. Again, that's interesting info, but not tackling our issue.

zldrobit commented 3 years ago

I would like to chime in. It will be great if Saved Models could have an analogy of TFLite metadata https://tensorflow.google.cn/lite/convert/metadata. By the way, TFLite models are FlatBuffer format.