opendatahub-io / caikit

Caikit is an AI toolkit that enables users to manage models through a set of developer friendly APIs.
Apache License 2.0
0 stars 4 forks source link

Wait for model directory to exist when "lazy_load_local_models" is enabled #21

Open rhuss opened 5 months ago

rhuss commented 5 months ago

Is your feature request related to a problem? Please describe.

With the new modelcar feature of KServe it is possible to access model data directly from within an OCI image without downloading or copying. However, since this model is injected as a sidecar, the startup order of the containers in non-deterministic.

Since the modelcar container creates the symbolic link /mnt/models to point to the model stored within that image, the path /mnt/models might not exist when the caikit transformer container starts. This will trigger line https://github.com/opendatahub-io/caikit/blob/4b42d37f240b60c3199ba9db260682c0094c464d/caikit/runtime/model_management/model_manager.py#L105 so that the startup fails.

Describe the solution you'd like

When lazy loading is enabled via lazy_load_local_models, the runtime should wait a certain amount of time for the path (/mnt/models) to come up before giving up. This can be tricky since when the model still needs to be loaded in the node's OCI runtime, it gets pulled from the registry, which might take quite some time. But if it is already loaded, the model can be delayed in a matter of seconds.

Additional context

More about the modelcar approach can be found here: