kserve / modelmesh-serving

Controller for ModelMesh
Apache License 2.0
204 stars 114 forks source link

Excessive unloading of models when loading an additional model #455

Open Gaddy-BL opened 1 year ago

Gaddy-BL commented 1 year ago

Issue Description

We see mm container logs where a thread (here model-load-5e5db6cc) that is loading a model is triggering evacuation of all (or most of) the loaded models. The evacuation are all triggered in the same millisecond.
The evacutaion triggers are followed by a warning log:

Entire cache capacity of 1835008 units (14336MiB) is now taken up by removed models that are still unloading

The size of the model that we load is 1G size as the loaded models - it should not require unloading so many loaded models.

We are trying to follow the code in ModelMesh.java between the line that sets the thread name (curThread.setName("model-load-" + modelId)) and the log that reports that we are starting to load the model (logger.info("Starting load for model " + modelId + " type=" + modelType)) to understand what triggers the loaded models evacuation.

We'd like to know how did modelmesh decide that it should evacuate so many models and where is this happening in the code.

To Reproduce

We don't have a reproducible way to get this issue but it happens quite often in our cluster. The issue seems to happen when the GPU memory is loaded with the max number of models it can carry and then we try to load an additional model.

Expected behavior

At most one or two models should be unloaded if space is required to load an additional model with the same characteristics as the loaded model.

Screenshots

The Kibana logs

image

Environment:

We are using version 0.11.0 and run on g4dn.xlarge instance

ckadner commented 1 year ago

@njhill -- can you provide some of your findings on this one?

Gaddy-BL commented 1 year ago

@ckadner - do you know if someone is looking into this? Perhaps @njhill ?

BenHaItay commented 1 month ago

@njhill @ckadner do you guys have any idea what can be the cause ? we might be able to investigate it on our end as well.