TFX serving - Githubissues

Hello again. If you want to serve the model through TFX by exporting SavedModels and running them on TF ModelServer on GPU instances, then the only two options for optimization that you can try are using a high batch sizes (as allowed by your available VRAM) and building with --predictiononly flag, which will build the generator in such a way that it only outputs the predicted frames.

As for making the model servable, the key thing you'll have to decide if you want to perform the animation process itself on the service-consuming client or as a part of the TFX pipeline. In the former case, the advantage is that on TFX side all you'll have to do is expose the ModelServers, and the downside is that you'll have to reimplement the animation process in a client, and if you decide to change how the models handle results of each step then you'll have to push updates to clients or risk the client-side animation code becoming incompatible with the served models. IN case of the latter, the advantage is that the client will only have to pass the source image and the driving video to the pipeline, and everything else will be handled on TFX side, so the risk of an incompatibility appearing in client-side code is slim to none. The downside is that you'll no longer be able to just expose the ModelServers serving the SavedModels, and instead you'll have to implement the animation process as a part of the inference pipeline (as a TFX custom component).

lshug / first-order-model-tf

TFX serving #12