kserve / modelmesh-runtime-adapter

Unified runtime-adapter image of the sidecar containers which run in the modelmesh pods
Apache License 2.0
21 stars 61 forks source link

Reduce size of runtime-adapter image (exclude Python/tensorflow to convert keras models) #59

Open GolanLevy opened 1 year ago

GolanLevy commented 1 year ago

The current image weight is very high (2.14Gb) which slows down the predictor's uptime.

Correct me if I'm wrong please, but the only reason the adapter needs to install tensorflow is to convert keras models to tensorflow models, which sounds weird to do it on runtime and not in advance, see

https://github.com/kserve/modelmesh-runtime-adapter/blob/f9781d287d31ec40c7c3eb77d5ac12eb68622aaa/model-mesh-triton-adapter/server/utils.go#L63-L64

https://github.com/kserve/modelmesh-runtime-adapter/blob/f9781d287d31ec40c7c3eb77d5ac12eb68622aaa/Dockerfile#L145 https://github.com/kserve/modelmesh-runtime-adapter/blob/f9781d287d31ec40c7c3eb77d5ac12eb68622aaa/Dockerfile#L164 https://github.com/kserve/modelmesh-runtime-adapter/blob/f9781d287d31ec40c7c3eb77d5ac12eb68622aaa/Dockerfile#L172

If we remove this option, we can remove the tensorflow installation, and since python is needed only for that, removing the entire python installation. This reduces the image size from 2.14 GB to 256Mb.

Can we just remove it? If not, can we have two images, the original one and a new slim one?

ckadner commented 10 months ago

This is a bit tricky.

We don't want to drop support for Keras models. Requiring users to convert possibly hundreds/thousands of Keras models to Tensorflow prior to deploying them may not be practical.

We could possibly have two images as you suggested: a smaller one without the conversion script and a large one with it. We would need to introduce a install/deployment option in the modelmesh-serving repo.

Users who decide to use the slim image would then be required to do the Keras to TF conversion prior to deploying an ISVC.