After downloading model and log line INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:121: Starting JinaBertModel model on Cpu, memory spikes to >8GB for a second
Memory usage after startup is down to <4GB and stays there
Expected behavior
The Container should not produce big memory spikes only during model load that can cause resource errors.
Otherwise Kubernetes Deployments may need to provision double the resources really needed for inference for each container, leading to a huge amount of unused memory capacity.
I tried to deploy this to a RH OpenShift cluster with hard pod memory limits of 4GB and failed because of this, although after startup the container never needs more than 4GB of memory for handling requests and inference, only on startup.
This is probably related to the Implementation of JinaBert.
When trying a model with another architecture like intfloat/multilingual-e5-large, i don't get this behavior.
System Info
Image: v1.2 CPU Model used: jinaai/jina-embeddings-v2-base-de Deployment: Docker / RH OpenShift
Information
Tasks
Reproduction
INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:121: Starting JinaBertModel model on Cpu
, memory spikes to >8GB for a secondExpected behavior
The Container should not produce big memory spikes only during model load that can cause resource errors. Otherwise Kubernetes Deployments may need to provision double the resources really needed for inference for each container, leading to a huge amount of unused memory capacity.
I tried to deploy this to a RH OpenShift cluster with hard pod memory limits of 4GB and failed because of this, although after startup the container never needs more than 4GB of memory for handling requests and inference, only on startup.