CPU Image: High memory usage on startup

System Info

Image: v1.2 CPU Model used: jinaai/jina-embeddings-v2-base-de Deployment: Docker / RH OpenShift

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

Run the CPU image with following compose.yaml

version: '3.8'
name: test-tei
services:
tei:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
command: ["--tokenization-workers", "1"]
environment:
  MODEL_ID: "jinaai/jina-embeddings-v2-base-de"
  REVISION: "5078d9924a7b3bdd9556928fcfc08b8de041bfc1"
  MAX_CLIENT_BATCH_SIZE: 64
volumes:
  - ./tei-docker-data:/data
ports:
  - "8081:80"

Monitor memory usage (e.g. via Docker Desktop)
After downloading model and log line INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:121: Starting JinaBertModel model on Cpu, memory spikes to >8GB for a second
Memory usage after startup is down to <4GB and stays there

Expected behavior

The Container should not produce big memory spikes only during model load that can cause resource errors. Otherwise Kubernetes Deployments may need to provision double the resources really needed for inference for each container, leading to a huge amount of unused memory capacity.

I tried to deploy this to a RH OpenShift cluster with hard pod memory limits of 4GB and failed because of this, although after startup the container never needs more than 4GB of memory for handling requests and inference, only on startup.

huggingface / text-embeddings-inference