Open Gaddy-BL opened 1 year ago
@njhill -- can you provide some of your findings on this one?
@ckadner - do you know if someone is looking into this? Perhaps @njhill ?
@njhill @ckadner do you guys have any idea what can be the cause ? we might be able to investigate it on our end as well.
Issue Description
We see mm container logs where a thread (here model-load-5e5db6cc) that is loading a model is triggering evacuation of all (or most of) the loaded models. The evacuation are all triggered in the same millisecond.
The evacutaion triggers are followed by a warning log:
The size of the model that we load is 1G size as the loaded models - it should not require unloading so many loaded models.
We are trying to follow the code in ModelMesh.java between the line that sets the thread name (
curThread.setName("model-load-" + modelId)
) and the log that reports that we are starting to load the model (logger.info("Starting load for model " + modelId + " type=" + modelType)
) to understand what triggers the loaded models evacuation.We'd like to know how did modelmesh decide that it should evacuate so many models and where is this happening in the code.
To Reproduce
We don't have a reproducible way to get this issue but it happens quite often in our cluster. The issue seems to happen when the GPU memory is loaded with the max number of models it can carry and then we try to load an additional model.
Expected behavior
At most one or two models should be unloaded if space is required to load an additional model with the same characteristics as the loaded model.
Screenshots
The Kibana logs
Environment:
We are using version 0.11.0 and run on g4dn.xlarge instance