Open GolanLevy opened 1 year ago
This is a bit tricky.
We don't want to drop support for Keras models. Requiring users to convert possibly hundreds/thousands of Keras models to Tensorflow prior to deploying them may not be practical.
We could possibly have two images as you suggested: a smaller one without the conversion script and a large one with it. We would need to introduce a install/deployment option in the modelmesh-serving repo.
Users who decide to use the slim image would then be required to do the Keras to TF conversion prior to deploying an ISVC.
The current image weight is very high (2.14Gb) which slows down the predictor's uptime.
Correct me if I'm wrong please, but the only reason the adapter needs to install tensorflow is to convert keras models to tensorflow models, which sounds weird to do it on runtime and not in advance, see
https://github.com/kserve/modelmesh-runtime-adapter/blob/f9781d287d31ec40c7c3eb77d5ac12eb68622aaa/model-mesh-triton-adapter/server/utils.go#L63-L64
https://github.com/kserve/modelmesh-runtime-adapter/blob/f9781d287d31ec40c7c3eb77d5ac12eb68622aaa/Dockerfile#L145 https://github.com/kserve/modelmesh-runtime-adapter/blob/f9781d287d31ec40c7c3eb77d5ac12eb68622aaa/Dockerfile#L164 https://github.com/kserve/modelmesh-runtime-adapter/blob/f9781d287d31ec40c7c3eb77d5ac12eb68622aaa/Dockerfile#L172
If we remove this option, we can remove the tensorflow installation, and since python is needed only for that, removing the entire python installation. This reduces the image size from 2.14 GB to 256Mb.
Can we just remove it? If not, can we have two images, the original one and a new slim one?