Open adelton opened 10 months ago
Assuming both just ship a runtime without bundling the AI model, there seems to be some serious bloat in bike-rentals-auto-ml.
Can you elaborate? Both include the serving runtime and model.
The serving runtime image alone for bike-rentals
is almost 900MB compressed: seldonio/mlserver:1.3.5-slim
. It is large, but supports a wide range of models: https://github.com/SeldonIO/MLServer#inference-runtimes
The bike-rentals
image also contains all of the dependencies for the model:
See #27 for seldonio/mlserver
vs openvino/mlserver
.
Our discussions typically revolved around the model being huge. In this case though, the model is small (54 MB) and the runtime is huge.
I wonder if we are able to demonstrate the behaviour with a runtime that might not be so generic but whose size would be more palatable.
seldonio/mlserver
and openvino/mlserver
are popular and cover most of the model frameworks and types/flavors. If we're looking for the best compatibility using these along with MLflow is probably the best option.
We can take the KServe approach and write our own runtimes if we want many smaller runtimes instead of a couple of large ones ¯\(ツ)/¯.
Unlike the
tensorflow-housing
image that has 120.6 MiB (at https://quay.io/repository/rhoai-edge/tensorflow-housing?tab=tags&tag=latest, and also reproduced in my setup), thebike-rentals-auto-ml
reported size is 1.6 GiB, at https://quay.io/repository/rhoai-edge/bike-rentals-auto-ml?tab=tags and again in my setup -- see https://github.com/opendatahub-io/ai-edge/issues/93#issuecomment-1730827828 and https://github.com/opendatahub-io/ai-edge/issues/93#issuecomment-1730835827.Assuming both just ship a runtime without bundling the AI model, there seems to be some serious bloat in
bike-rentals-auto-ml
.The
pipelines/README.md
should likely explain why those container images differ by an order of magnitude in size, what is so special about the second one that it warrants the extra bits.