immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
49.39k stars 2.6k forks source link

immich_machine_learning openvino [GPU] out of GPU resources N5095 8Gb #13674

Closed lollipopll closed 1 hour ago

lollipopll commented 1 day ago

The bug

The process crashes immediately after startup. Memory utilization reaches 8GB in a few seconds from the process and it stops working even on the CPU N5095

2024-10-22 11:55:32.645913931 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running OpenVINO-EP-subgraph_2 node. Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgraph_2_0' Status Message: /onnxruntime/onnxruntime/core/providers/openvino/ov_interface.cc:243 void onnxruntime::openvino_ep::OVInferRequest::WaitRequest() [OpenVINO-EP] Wait Model Failed: Exception from src/inference/src/cpp/infer_request.cpp:245: Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_engine.cpp:185: [GPU] out of GPU resources [10/22/24 11:55:32] ERROR Exception in ASGI application

                         ╭─────── Traceback (most recent call last) ───────╮
                         │ /usr/src/app/main.py:150 in predict             │
                         │                                                 │
                         │   147 │   │   inputs = text                     │
                         │   148 │   else:                                 │
                         │   149 │   │   raise HTTPException(400, "Either  │
                         │ ❱ 150 │   response = await run_inference(inputs │
                         │   151 │   return ORJSONResponse(response)       │
                         │   152                                           │
                         │   153                                           │
                         │                                                 │
                         │ /usr/src/app/main.py:175 in run_inference       │
                         │                                                 │
                         │   172 │   without_deps, with_deps = entries     │
                         │   173 │   await asyncio.gather(*[_run_inference │
                         │   174 │   if with_deps:                         │
                         │ ❱ 175 │   │   await asyncio.gather(*[_run_infer │
                         │   176 │   if isinstance(payload, Image):        │
                         │   177 │   │   response["imageHeight"], response │
                         │   178                                           │
                         │                                                 │
                         │ /usr/src/app/main.py:168 in _run_inference      │
                         │                                                 │
                         │   165 │   │   │   │   message = f"Task {entry[' │
                         │       output of {dep}"                          │
                         │   166 │   │   │   │   raise HTTPException(400,  │
                         │   167 │   │   model = await load(model)         │
                         │ ❱ 168 │   │   output = await run(model.predict, │
                         │   169 │   │   outputs[model.identity] = output  │
                         │   170 │   │   response[entry["task"]] = output  │
                         │   171                                           │
                         │                                                 │
                         │ /usr/src/app/main.py:186 in run                 │
                         │                                                 │
                         │   183 │   if thread_pool is None:               │
                         │   184 │   │   return func(*args, **kwargs)      │
                         │   185 │   partial_func = partial(func, *args, * │
                         │ ❱ 186 │   return await asyncio.get_running_loop │
                         │   187                                           │
                         │   188                                           │
                         │   189 async def load(model: InferenceModel) ->  │
                         │                                                 │
                         │ /usr/local/lib/python3.11/concurrent/futures/th │
                         │ read.py:58 in run                               │
                         │                                                 │
                         │ /usr/src/app/models/base.py:60 in predict       │
                         │                                                 │
                         │    57 │   │   self.load()                       │
                         │    58 │   │   if model_kwargs:                  │
                         │    59 │   │   │   self.configure(**model_kwargs │
                         │ ❱  60 │   │   return self._predict(*inputs, **m │
                         │    61 │                                         │
                         │    62 │   @abstractmethod                       │
                         │    63 │   def _predict(self, *inputs: Any, **mo │
                         │                                                 │
                         │ /usr/src/app/models/facial_recognition/recognit │
                         │ ion.py:45 in _predict                           │
                         │                                                 │
                         │   42 │   │   │   return []                      │
                         │   43 │   │   inputs = decode_cv2(inputs)        │
                         │   44 │   │   cropped_faces = self._crop(inputs, │
                         │ ❱ 45 │   │   embeddings = self._predict_batch(c │
                         │      self._predict_single(cropped_faces)        │
                         │   46 │   │   return self.postprocess(faces, emb │
                         │   47 │                                          │
                         │   48 │   def _predict_batch(self, cropped_faces │
                         │      NDArray[np.float32]:                       │
                         │                                                 │
                         │ /usr/src/app/models/facial_recognition/recognit │
                         │ ion.py:49 in _predict_batch                     │
                         │                                                 │
                         │   46 │   │   return self.postprocess(faces, emb │
                         │   47 │                                          │
                         │   48 │   def _predict_batch(self, cropped_faces │
                         │      NDArray[np.float32]:                       │
                         │ ❱ 49 │   │   embeddings: NDArray[np.float32] =  │
                         │   50 │   │   return embeddings                  │
                         │   51 │                                          │
                         │   52 │   def _predict_single(self, cropped_face │
                         │      NDArray[np.float32]:                       │
                         │                                                 │
                         │ /opt/venv/lib/python3.11/site-packages/insightf │
                         │ ace/model_zoo/arcface_onnx.py:84 in get_feat    │
                         │                                                 │
                         │   81 │   │                                      │
                         │   82 │   │   blob = cv2.dnn.blobFromImages(imgs │
                         │   83 │   │   │   │   │   │   │   │   │     (sel │
                         │      self.input_mean), swapRB=True)             │
                         │ ❱ 84 │   │   net_out = self.session.run(self.ou │
                         │   85 │   │   return net_out                     │
                         │   86 │                                          │
                         │   87 │   def forward(self, batch_data):         │
                         │                                                 │
                         │ /usr/src/app/sessions/ort.py:49 in run          │
                         │                                                 │
                         │    46 │   │   input_feed: dict[str, NDArray[np. │
                         │    47 │   │   run_options: Any = None,          │
                         │    48 │   ) -> list[NDArray[np.float32]]:       │
                         │ ❱  49 │   │   outputs: list[NDArray[np.float32] │
                         │       run_options)                              │
                         │    50 │   │   return outputs                    │
                         │    51 │                                         │
                         │    52 │   @property                             │
                         │                                                 │
                         │ /opt/venv/lib/python3.11/site-packages/onnxrunt │
                         │ ime/capi/onnxruntime_inference_collection.py:22 │
                         │ 0 in run                                        │
                         │                                                 │
                         │    217 │   │   if not output_names:             │
                         │    218 │   │   │   output_names = [output.name  │
                         │    219 │   │   try:                             │
                         │ ❱  220 │   │   │   return self._sess.run(output │
                         │    221 │   │   except C.EPFail as err:          │
                         │    222 │   │   │   if self._enable_fallback:    │
                         │    223 │   │   │   │   print(f"EP Error: {err!s │
                         ╰─────────────────────────────────────────────────╯
                         Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero     
                         status code returned while running                 
                         OpenVINO-EP-subgraph_2 node.                       
                         Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgrap
                         h_2_0' Status Message:                             
                         /onnxruntime/onnxruntime/core/providers/openvino/ov
                         _interface.cc:243 void                             
                         onnxruntime::openvino_ep::OVInferRequest::WaitReque
                         st() [OpenVINO-EP]  Wait Model Failed: Exception   
                         from src/inference/src/cpp/infer_request.cpp:245:  
                         Exception from                                     
                         src/plugins/intel_gpu/src/runtime/ocl/ocl_engine.cp
                         p:185:                                             
                         [GPU] out of GPU resources                  

The OS that Immich Server is running on

Ubuntu 22.04.5 LTS

Version of Immich Server

v1.118.2

Version of Immich Mobile App

v1.118.2

Platform with the issue

Your docker-compose.yml content

#
# WARNING: Make sure to use the docker-compose.yml of the current release:
#
# https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml
#
# The compose file on main may not be compatible with the latest release.
#

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    extends:
      file: hwaccel.transcoding.yml
      service: quicksync # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      # Do not edit the next line. If you want to change the media storage location on your system, edit the value of UPLOAD_LOCATION in the .env file
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    ports:
      - '2283:2283'
    depends_on:
      - redis
      - database
    restart: always
    healthcheck:
      disable: false

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-openvino
    extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
      file: hwaccel.ml.yml
      service: openvino # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always
    healthcheck:
      disable: false

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:2ba50e1ac3a0ea17b736ce9db2b0a9f6f8b85d4c27d5f5accc6a416d8f42c6d5
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      # Do not edit the next line. If you want to change the database storage location on your system, edit the value of DB_DATA_LOCATION in the .env file
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    ports:
      - 54321:5432
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    command:
      [
        'postgres',
        '-c',
        'shared_preload_libraries=vectors.so',
        '-c',
        'search_path="$$user", public, vectors',
        '-c',
        'logging_collector=on',
        '-c',
        'max_wal_size=2GB',
        '-c',
        'shared_buffers=512MB',
        '-c',
        'wal_compression=on',
      ]
    restart: always

volumes:
  model-cache:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

UPLOAD_LOCATION=/mnt/immich/library
DB_DATA_LOCATION=./postgres

# To set a timezone, uncomment the next line and change Etc/UTC to a TZ identifier from this list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List
# TZ=Etc/UTC

IMMICH_VERSION=release

DB_PASSWORD=***
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

Reproduction steps

  1. http://host:2283/admin/jobs-status
  2. run FACE DETECTION
  3. job count = 1

Relevant log output

2024-10-22 11:55:32.645913931 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running OpenVINO-EP-subgraph_2 node. Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgraph_2_0' Status Message: /onnxruntime/onnxruntime/core/providers/openvino/ov_interface.cc:243 void onnxruntime::openvino_ep::OVInferRequest::WaitRequest() [OpenVINO-EP]  Wait Model Failed: Exception from src/inference/src/cpp/infer_request.cpp:245:
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_engine.cpp:185:
[GPU] out of GPU resources
[10/22/24 11:55:32] ERROR    Exception in ASGI application                      

                             ╭─────── Traceback (most recent call last) ───────╮
                             │ /usr/src/app/main.py:150 in predict             │
                             │                                                 │
                             │   147 │   │   inputs = text                     │
                             │   148 │   else:                                 │
                             │   149 │   │   raise HTTPException(400, "Either  │
                             │ ❱ 150 │   response = await run_inference(inputs │
                             │   151 │   return ORJSONResponse(response)       │
                             │   152                                           │
                             │   153                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:175 in run_inference       │
                             │                                                 │
                             │   172 │   without_deps, with_deps = entries     │
                             │   173 │   await asyncio.gather(*[_run_inference │
                             │   174 │   if with_deps:                         │
                             │ ❱ 175 │   │   await asyncio.gather(*[_run_infer │
                             │   176 │   if isinstance(payload, Image):        │
                             │   177 │   │   response["imageHeight"], response │
                             │   178                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:168 in _run_inference      │
                             │                                                 │
                             │   165 │   │   │   │   message = f"Task {entry[' │
                             │       output of {dep}"                          │
                             │   166 │   │   │   │   raise HTTPException(400,  │
                             │   167 │   │   model = await load(model)         │
                             │ ❱ 168 │   │   output = await run(model.predict, │
                             │   169 │   │   outputs[model.identity] = output  │
                             │   170 │   │   response[entry["task"]] = output  │
                             │   171                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:186 in run                 │
                             │                                                 │
                             │   183 │   if thread_pool is None:               │
                             │   184 │   │   return func(*args, **kwargs)      │
                             │   185 │   partial_func = partial(func, *args, * │
                             │ ❱ 186 │   return await asyncio.get_running_loop │
                             │   187                                           │
                             │   188                                           │
                             │   189 async def load(model: InferenceModel) ->  │
                             │                                                 │
                             │ /usr/local/lib/python3.11/concurrent/futures/th │
                             │ read.py:58 in run                               │
                             │                                                 │
                             │ /usr/src/app/models/base.py:60 in predict       │
                             │                                                 │
                             │    57 │   │   self.load()                       │
                             │    58 │   │   if model_kwargs:                  │
                             │    59 │   │   │   self.configure(**model_kwargs │
                             │ ❱  60 │   │   return self._predict(*inputs, **m │
                             │    61 │                                         │
                             │    62 │   @abstractmethod                       │
                             │    63 │   def _predict(self, *inputs: Any, **mo │
                             │                                                 │
                             │ /usr/src/app/models/facial_recognition/recognit │
                             │ ion.py:45 in _predict                           │
                             │                                                 │
                             │   42 │   │   │   return []                      │
                             │   43 │   │   inputs = decode_cv2(inputs)        │
                             │   44 │   │   cropped_faces = self._crop(inputs, │
                             │ ❱ 45 │   │   embeddings = self._predict_batch(c │
                             │      self._predict_single(cropped_faces)        │
                             │   46 │   │   return self.postprocess(faces, emb │
                             │   47 │                                          │
                             │   48 │   def _predict_batch(self, cropped_faces │
                             │      NDArray[np.float32]:                       │
                             │                                                 │
                             │ /usr/src/app/models/facial_recognition/recognit │
                             │ ion.py:49 in _predict_batch                     │
                             │                                                 │
                             │   46 │   │   return self.postprocess(faces, emb │
                             │   47 │                                          │
                             │   48 │   def _predict_batch(self, cropped_faces │
                             │      NDArray[np.float32]:                       │
                             │ ❱ 49 │   │   embeddings: NDArray[np.float32] =  │
                             │   50 │   │   return embeddings                  │
                             │   51 │                                          │
                             │   52 │   def _predict_single(self, cropped_face │
                             │      NDArray[np.float32]:                       │
                             │                                                 │
                             │ /opt/venv/lib/python3.11/site-packages/insightf │
                             │ ace/model_zoo/arcface_onnx.py:84 in get_feat    │
                             │                                                 │
                             │   81 │   │                                      │
                             │   82 │   │   blob = cv2.dnn.blobFromImages(imgs │
                             │   83 │   │   │   │   │   │   │   │   │     (sel │
                             │      self.input_mean), swapRB=True)             │
                             │ ❱ 84 │   │   net_out = self.session.run(self.ou │
                             │   85 │   │   return net_out                     │
                             │   86 │                                          │
                             │   87 │   def forward(self, batch_data):         │
                             │                                                 │
                             │ /usr/src/app/sessions/ort.py:49 in run          │
                             │                                                 │
                             │    46 │   │   input_feed: dict[str, NDArray[np. │
                             │    47 │   │   run_options: Any = None,          │
                             │    48 │   ) -> list[NDArray[np.float32]]:       │
                             │ ❱  49 │   │   outputs: list[NDArray[np.float32] │
                             │       run_options)                              │
                             │    50 │   │   return outputs                    │
                             │    51 │                                         │
                             │    52 │   @property                             │
                             │                                                 │
                             │ /opt/venv/lib/python3.11/site-packages/onnxrunt │
                             │ ime/capi/onnxruntime_inference_collection.py:22 │
                             │ 0 in run                                        │
                             │                                                 │
                             │    217 │   │   if not output_names:             │
                             │    218 │   │   │   output_names = [output.name  │
                             │    219 │   │   try:                             │
                             │ ❱  220 │   │   │   return self._sess.run(output │
                             │    221 │   │   except C.EPFail as err:          │
                             │    222 │   │   │   if self._enable_fallback:    │
                             │    223 │   │   │   │   print(f"EP Error: {err!s │
                             ╰─────────────────────────────────────────────────╯
                             Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero     
                             status code returned while running                 
                             OpenVINO-EP-subgraph_2 node.                       
                             Name:'OpenVINOExecutionProvider_OpenVINO-EP-subgrap
                             h_2_0' Status Message:                             
                             /onnxruntime/onnxruntime/core/providers/openvino/ov
                             _interface.cc:243 void                             
                             onnxruntime::openvino_ep::OVInferRequest::WaitReque
                             st() [OpenVINO-EP]  Wait Model Failed: Exception   
                             from src/inference/src/cpp/infer_request.cpp:245:  
                             Exception from                                     
                             src/plugins/intel_gpu/src/runtime/ocl/ocl_engine.cp
                             p:185:                                             
                             [GPU] out of GPU resources

Additional information

No response

lollipopll commented 22 hours ago

https://immich.app/docs/features/ml-hardware-acceleration/ OpenVINO The server must have a discrete GPU, i.e. Iris Xe or Arc. Expect issues when attempting to use integrated graphics. Ensure the server's kernel version is new enough to use the device for hardware accceleration. :((((((((