immich_machine_learning openvino [GPU] out of GPU resources N5095 8Gb

The bug

The process crashes immediately after startup. Memory utilization reaches 8GB in a few seconds from the process and it stops working even on the CPU N5095

2024-10-22 11:55:32.645913931 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running OpenVINO-EP-subgraph_2 node. Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgraph_2_0' Status Message: /onnxruntime/onnxruntime/core/providers/openvino/ov_interface.cc:243 void onnxruntime::openvino_ep::OVInferRequest::WaitRequest() [OpenVINO-EP] Wait Model Failed: Exception from src/inference/src/cpp/infer_request.cpp:245: Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_engine.cpp:185: [GPU] out of GPU resources [10/22/24 11:55:32] ERROR Exception in ASGI application

                         ╭─────── Traceback (most recent call last) ───────╮
                         │ /usr/src/app/main.py:150 in predict             │
                         │                                                 │
                         │   147 │   │   inputs = text                     │
                         │   148 │   else:                                 │
                         │   149 │   │   raise HTTPException(400, "Either  │
                         │ ❱ 150 │   response = await run_inference(inputs │
                         │   151 │   return ORJSONResponse(response)       │
                         │   152                                           │
                         │   153                                           │
                         │                                                 │
                         │ /usr/src/app/main.py:175 in run_inference       │
                         │                                                 │
                         │   172 │   without_deps, with_deps = entries     │
                         │   173 │   await asyncio.gather(*[_run_inference │
                         │   174 │   if with_deps:                         │
                         │ ❱ 175 │   │   await asyncio.gather(*[_run_infer │
                         │   176 │   if isinstance(payload, Image):        │
                         │   177 │   │   response["imageHeight"], response │
                         │   178                                           │
                         │                                                 │
                         │ /usr/src/app/main.py:168 in _run_inference      │
                         │                                                 │
                         │   165 │   │   │   │   message = f"Task {entry[' │
                         │       output of {dep}"                          │
                         │   166 │   │   │   │   raise HTTPException(400,  │
                         │   167 │   │   model = await load(model)         │
                         │ ❱ 168 │   │   output = await run(model.predict, │
                         │   169 │   │   outputs[model.identity] = output  │
                         │   170 │   │   response[entry["task"]] = output  │
                         │   171                                           │
                         │                                                 │
                         │ /usr/src/app/main.py:186 in run                 │
                         │                                                 │
                         │   183 │   if thread_pool is None:               │
                         │   184 │   │   return func(*args, **kwargs)      │
                         │   185 │   partial_func = partial(func, *args, * │
                         │ ❱ 186 │   return await asyncio.get_running_loop │
                         │   187                                           │
                         │   188                                           │
                         │   189 async def load(model: InferenceModel) ->  │
                         │                                                 │
                         │ /usr/local/lib/python3.11/concurrent/futures/th │
                         │ read.py:58 in run                               │
                         │                                                 │
                         │ /usr/src/app/models/base.py:60 in predict       │
                         │                                                 │
                         │    57 │   │   self.load()                       │
                         │    58 │   │   if model_kwargs:                  │
                         │    59 │   │   │   self.configure(**model_kwargs │
                         │ ❱  60 │   │   return self._predict(*inputs, **m │
                         │    61 │                                         │
                         │    62 │   @abstractmethod                       │
                         │    63 │   def _predict(self, *inputs: Any, **mo │
                         │                                                 │
                         │ /usr/src/app/models/facial_recognition/recognit │
                         │ ion.py:45 in _predict                           │
                         │                                                 │
                         │   42 │   │   │   return []                      │
                         │   43 │   │   inputs = decode_cv2(inputs)        │
                         │   44 │   │   cropped_faces = self._crop(inputs, │
                         │ ❱ 45 │   │   embeddings = self._predict_batch(c │
                         │      self._predict_single(cropped_faces)        │
                         │   46 │   │   return self.postprocess(faces, emb │
                         │   47 │                                          │
                         │   48 │   def _predict_batch(self, cropped_faces │
                         │      NDArray[np.float32]:                       │
                         │                                                 │
                         │ /usr/src/app/models/facial_recognition/recognit │
                         │ ion.py:49 in _predict_batch                     │
                         │                                                 │
                         │   46 │   │   return self.postprocess(faces, emb │
                         │   47 │                                          │
                         │   48 │   def _predict_batch(self, cropped_faces │
                         │      NDArray[np.float32]:                       │
                         │ ❱ 49 │   │   embeddings: NDArray[np.float32] =  │
                         │   50 │   │   return embeddings                  │
                         │   51 │                                          │
                         │   52 │   def _predict_single(self, cropped_face │
                         │      NDArray[np.float32]:                       │
                         │                                                 │
                         │ /opt/venv/lib/python3.11/site-packages/insightf │
                         │ ace/model_zoo/arcface_onnx.py:84 in get_feat    │
                         │                                                 │
                         │   81 │   │                                      │
                         │   82 │   │   blob = cv2.dnn.blobFromImages(imgs │
                         │   83 │   │   │   │   │   │   │   │   │     (sel │
                         │      self.input_mean), swapRB=True)             │
                         │ ❱ 84 │   │   net_out = self.session.run(self.ou │
                         │   85 │   │   return net_out                     │
                         │   86 │                                          │
                         │   87 │   def forward(self, batch_data):         │
                         │                                                 │
                         │ /usr/src/app/sessions/ort.py:49 in run          │
                         │                                                 │
                         │    46 │   │   input_feed: dict[str, NDArray[np. │
                         │    47 │   │   run_options: Any = None,          │
                         │    48 │   ) -> list[NDArray[np.float32]]:       │
                         │ ❱  49 │   │   outputs: list[NDArray[np.float32] │
                         │       run_options)                              │
                         │    50 │   │   return outputs                    │
                         │    51 │                                         │
                         │    52 │   @property                             │
                         │                                                 │
                         │ /opt/venv/lib/python3.11/site-packages/onnxrunt │
                         │ ime/capi/onnxruntime_inference_collection.py:22 │
                         │ 0 in run                                        │
                         │                                                 │
                         │    217 │   │   if not output_names:             │
                         │    218 │   │   │   output_names = [output.name  │
                         │    219 │   │   try:                             │
                         │ ❱  220 │   │   │   return self._sess.run(output │
                         │    221 │   │   except C.EPFail as err:          │
                         │    222 │   │   │   if self._enable_fallback:    │
                         │    223 │   │   │   │   print(f"EP Error: {err!s │
                         ╰─────────────────────────────────────────────────╯
                         Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero     
                         status code returned while running                 
                         OpenVINO-EP-subgraph_2 node.                       
                         Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgrap
                         h_2_0' Status Message:                             
                         /onnxruntime/onnxruntime/core/providers/openvino/ov
                         _interface.cc:243 void                             
                         onnxruntime::openvino_ep::OVInferRequest::WaitReque
                         st() [OpenVINO-EP]  Wait Model Failed: Exception   
                         from src/inference/src/cpp/infer_request.cpp:245:  
                         Exception from                                     
                         src/plugins/intel_gpu/src/runtime/ocl/ocl_engine.cp
                         p:185:                                             
                         [GPU] out of GPU resources

The OS that Immich Server is running on

Ubuntu 22.04.5 LTS

Version of Immich Server

v1.118.2

Version of Immich Mobile App

v1.118.2

Platform with the issue

[X] Server
[ ] Web
[ ] Mobile

Your docker-compose.yml content

#
# WARNING: Make sure to use the docker-compose.yml of the current release:
#
# https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml
#
# The compose file on main may not be compatible with the latest release.
#

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    extends:
      file: hwaccel.transcoding.yml
      service: quicksync # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      # Do not edit the next line. If you want to change the media storage location on your system, edit the value of UPLOAD_LOCATION in the .env file
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    ports:
      - '2283:2283'
    depends_on:
      - redis
      - database
    restart: always
    healthcheck:
      disable: false

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-openvino
    extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
      file: hwaccel.ml.yml
      service: openvino # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always
    healthcheck:
      disable: false

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:2ba50e1ac3a0ea17b736ce9db2b0a9f6f8b85d4c27d5f5accc6a416d8f42c6d5
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      # Do not edit the next line. If you want to change the database storage location on your system, edit the value of DB_DATA_LOCATION in the .env file
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    ports:
      - 54321:5432
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    command:
      [
        'postgres',
        '-c',
        'shared_preload_libraries=vectors.so',
        '-c',
        'search_path="$$user", public, vectors',
        '-c',
        'logging_collector=on',
        '-c',
        'max_wal_size=2GB',
        '-c',
        'shared_buffers=512MB',
        '-c',
        'wal_compression=on',
      ]
    restart: always

volumes:
  model-cache:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

UPLOAD_LOCATION=/mnt/immich/library
DB_DATA_LOCATION=./postgres

# To set a timezone, uncomment the next line and change Etc/UTC to a TZ identifier from this list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List
# TZ=Etc/UTC

IMMICH_VERSION=release

DB_PASSWORD=***
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

Reproduction steps

http://host:2283/admin/jobs-status
run FACE DETECTION
job count = 1

Relevant log output

2024-10-22 11:55:32.645913931 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running OpenVINO-EP-subgraph_2 node. Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgraph_2_0' Status Message: /onnxruntime/onnxruntime/core/providers/openvino/ov_interface.cc:243 void onnxruntime::openvino_ep::OVInferRequest::WaitRequest() [OpenVINO-EP]  Wait Model Failed: Exception from src/inference/src/cpp/infer_request.cpp:245:
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_engine.cpp:185:
[GPU] out of GPU resources
[10/22/24 11:55:32] ERROR    Exception in ASGI application                      

                             ╭─────── Traceback (most recent call last) ───────╮
                             │ /usr/src/app/main.py:150 in predict             │
                             │                                                 │
                             │   147 │   │   inputs = text                     │
                             │   148 │   else:                                 │
                             │   149 │   │   raise HTTPException(400, "Either  │
                             │ ❱ 150 │   response = await run_inference(inputs │
                             │   151 │   return ORJSONResponse(response)       │
                             │   152                                           │
                             │   153                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:175 in run_inference       │
                             │                                                 │
                             │   172 │   without_deps, with_deps = entries     │
                             │   173 │   await asyncio.gather(*[_run_inference │
                             │   174 │   if with_deps:                         │
                             │ ❱ 175 │   │   await asyncio.gather(*[_run_infer │
                             │   176 │   if isinstance(payload, Image):        │
                             │   177 │   │   response["imageHeight"], response │
                             │   178                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:168 in _run_inference      │
                             │                                                 │
                             │   165 │   │   │   │   message = f"Task {entry[' │
                             │       output of {dep}"                          │
                             │   166 │   │   │   │   raise HTTPException(400,  │
                             │   167 │   │   model = await load(model)         │
                             │ ❱ 168 │   │   output = await run(model.predict, │
                             │   169 │   │   outputs[model.identity] = output  │
                             │   170 │   │   response[entry["task"]] = output  │
                             │   171                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:186 in run                 │
                             │                                                 │
                             │   183 │   if thread_pool is None:               │
                             │   184 │   │   return func(*args, **kwargs)      │
                             │   185 │   partial_func = partial(func, *args, * │
                             │ ❱ 186 │   return await asyncio.get_running_loop │
                             │   187                                           │
                             │   188                                           │
                             │   189 async def load(model: InferenceModel) ->  │
                             │                                                 │
                             │ /usr/local/lib/python3.11/concurrent/futures/th │
                             │ read.py:58 in run                               │
                             │                                                 │
                             │ /usr/src/app/models/base.py:60 in predict       │
                             │                                                 │
                             │    57 │   │   self.load()                       │
                             │    58 │   │   if model_kwargs:                  │
                             │    59 │   │   │   self.configure(**model_kwargs │
                             │ ❱  60 │   │   return self._predict(*inputs, **m │
                             │    61 │                                         │
                             │    62 │   @abstractmethod                       │
                             │    63 │   def _predict(self, *inputs: Any, **mo │
                             │                                                 │
                             │ /usr/src/app/models/facial_recognition/recognit │
                             │ ion.py:45 in _predict                           │
                             │                                                 │
                             │   42 │   │   │   return []                      │
                             │   43 │   │   inputs = decode_cv2(inputs)        │
                             │   44 │   │   cropped_faces = self._crop(inputs, │
                             │ ❱ 45 │   │   embeddings = self._predict_batch(c │
                             │      self._predict_single(cropped_faces)        │
                             │   46 │   │   return self.postprocess(faces, emb │
                             │   47 │                                          │
                             │   48 │   def _predict_batch(self, cropped_faces │
                             │      NDArray[np.float32]:                       │
                             │                                                 │
                             │ /usr/src/app/models/facial_recognition/recognit │
                             │ ion.py:49 in _predict_batch                     │
                             │                                                 │
                             │   46 │   │   return self.postprocess(faces, emb │
                             │   47 │                                          │
                             │   48 │   def _predict_batch(self, cropped_faces │
                             │      NDArray[np.float32]:                       │
                             │ ❱ 49 │   │   embeddings: NDArray[np.float32] =  │
                             │   50 │   │   return embeddings                  │
                             │   51 │                                          │
                             │   52 │   def _predict_single(self, cropped_face │
                             │      NDArray[np.float32]:                       │
                             │                                                 │
                             │ /opt/venv/lib/python3.11/site-packages/insightf │
                             │ ace/model_zoo/arcface_onnx.py:84 in get_feat    │
                             │                                                 │
                             │   81 │   │                                      │
                             │   82 │   │   blob = cv2.dnn.blobFromImages(imgs │
                             │   83 │   │   │   │   │   │   │   │   │     (sel │
                             │      self.input_mean), swapRB=True)             │
                             │ ❱ 84 │   │   net_out = self.session.run(self.ou │
                             │   85 │   │   return net_out                     │
                             │   86 │                                          │
                             │   87 │   def forward(self, batch_data):         │
                             │                                                 │
                             │ /usr/src/app/sessions/ort.py:49 in run          │
                             │                                                 │
                             │    46 │   │   input_feed: dict[str, NDArray[np. │
                             │    47 │   │   run_options: Any = None,          │
                             │    48 │   ) -> list[NDArray[np.float32]]:       │
                             │ ❱  49 │   │   outputs: list[NDArray[np.float32] │
                             │       run_options)                              │
                             │    50 │   │   return outputs                    │
                             │    51 │                                         │
                             │    52 │   @property                             │
                             │                                                 │
                             │ /opt/venv/lib/python3.11/site-packages/onnxrunt │
                             │ ime/capi/onnxruntime_inference_collection.py:22 │
                             │ 0 in run                                        │
                             │                                                 │
                             │    217 │   │   if not output_names:             │
                             │    218 │   │   │   output_names = [output.name  │
                             │    219 │   │   try:                             │
                             │ ❱  220 │   │   │   return self._sess.run(output │
                             │    221 │   │   except C.EPFail as err:          │
                             │    222 │   │   │   if self._enable_fallback:    │
                             │    223 │   │   │   │   print(f"EP Error: {err!s │
                             ╰─────────────────────────────────────────────────╯
                             Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero     
                             status code returned while running                 
                             OpenVINO-EP-subgraph_2 node.                       
                             Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgrap
                             h_2_0' Status Message:                             
                             /onnxruntime/onnxruntime/core/providers/openvino/ov
                             _interface.cc:243 void                             
                             onnxruntime::openvino_ep::OVInferRequest::WaitReque
                             st() [OpenVINO-EP]  Wait Model Failed: Exception   
                             from src/inference/src/cpp/infer_request.cpp:245:  
                             Exception from                                     
                             src/plugins/intel_gpu/src/runtime/ocl/ocl_engine.cp
                             p:185:                                             
                             [GPU] out of GPU resources

Additional information

No response

immich-app / immich