[BUG] Unable to download CLIP model for search

dankasak commented 1 year ago

[!Important]

🟢 See this comment for temporary solution 🟢

The bug

When I search for anything in immich, I get generic errors in the UI. In docker logs, I can see that something is trying to download Downloading clip model 'ViT-B-32::openai' ... and "This may take a while". However it fails within about 3 seconds. I've downloaded this on the host using curl. Can I persist this somewhere for whatever needs it ... and if so, where? Why is it failing so quickly?

This seems to be triggered from: https://github.com/jina-ai/clip-as-service/blob/main/server/clip_server/model/clip_onnx.py

c69e23fa6733 [hooks.server.ts]:handleError Not found: /_app/immutable/nodes/0.c95dfcd6.js
c69e23fa6733 [hooks.server.ts]:handleError Not found: /_app/immutable/chunks/menu-option.36f2860d.js
c69e23fa6733 [hooks.server.ts]:handleError Not found: /_app/immutable/chunks/image-thumbnail.ef5e539c.js
c69e23fa6733 [hooks.server.ts]:handleError Not found: /_app/immutable/chunks/download-action.de99beb0.js
c69e23fa6733 [hooks.server.ts]:handleError Not found: /_app/immutable/chunks/thumbnail.5d0111e5.js
1058d5367490 I20230914 00:44:31.263882 353 raft_server.cpp:546] Term: 8, last_index index: 53064, committed_index: 53064, known_applied_index: 53064, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 848797
1058d5367490 I20230914 00:44:31.263913 442 raft_server.h:60] Peer refresh succeeded!
279bec116ed3 [09/14/23 00:44:33] INFO Downloading clip model 'ViT-B-32::openai'.This may
279bec116ed3 take a while.
279bec116ed3 Failed to download
279bec116ed3 https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d536572
279bec116ed3 76696365/onnx/ViT-B-32/textual.onnx with <HTTPError 416: 'Requested Range Not
279bec116ed3 Satisfiable'> at the 0th attempt
279bec116ed3 Failed to download
279bec116ed3 https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d536572
279bec116ed3 76696365/onnx/ViT-B-32/textual.onnx with <HTTPError 416: 'Requested Range Not
279bec116ed3 Satisfiable'> at the 1th attempt
c69e23fa6733 {
279bec116ed3 Failed to download
f9ab2bb52a73 [Nest] 2 - 09/14/2023, 12:44:39 AM ERROR [ExceptionsHandler] Request for clip failed with status 500: Internal Server Error
279bec116ed3 https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d536572
c69e23fa6733 status: 500,
f9ab2bb52a73 Error: Request for clip failed with status 500: Internal Server Error
279bec116ed3 76696365/onnx/ViT-B-32/textual.onnx with <HTTPError 416: 'Requested Range Not
c69e23fa6733 url: 'GET /search?q=tree&clip=true',
279bec116ed3 Satisfiable'> at the 2th attempt
c69e23fa6733 response: { statusCode: 500, message: 'Internal server error' }
f9ab2bb52a73 at [MachineLearningRepository.post](https://machinelearningrepository.post/) (/usr/src/app/dist/infra/repositories/machine-learning.repository.js:29:19)
279bec116ed3 textual.onnx 0.0% • 0.0/254.1 MB • ? • 0:00:00
c69e23fa6733 }
f9ab2bb52a73 at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
279bec116ed3
c69e23fa6733 [hooks.server.ts]:handleError Internal server error
f9ab2bb52a73 at async SearchService.search (/usr/src/app/dist/domain/search/search.service.js:114:35)
279bec116ed3 Exception in ASGI application
f9ab2bb52a73 at async /usr/src/app/node_modules/@nestjs/core/router/router-execution-context.js:46:28
279bec116ed3 Traceback (most recent call last):
f9ab2bb52a73 at async /usr/src/app/node_modules/@nestjs/core/router/router-proxy.js:9:17
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 435, in run_asgi
279bec116ed3 result = await app( # type: ignore[func-returns-value]
279bec116ed3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
279bec116ed3 return await [self.app](https://self.app/)(scope, receive, send)
279bec116ed3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/fastapi/applications.py", line 276, in __call__
279bec116ed3 await super().__call__(scope, receive, send)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
279bec116ed3 await self.middleware_stack(scope, receive, send)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
279bec116ed3 raise exc
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
279bec116ed3 await [self.app](https://self.app/)(scope, receive, _send)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
279bec116ed3 raise exc
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
279bec116ed3 await [self.app](https://self.app/)(scope, receive, sender)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
279bec116ed3 raise e
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
279bec116ed3 await [self.app](https://self.app/)(scope, receive, send)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
279bec116ed3 await route.handle(scope, receive, send)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
279bec116ed3 await [self.app](https://self.app/)(scope, receive, send)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
279bec116ed3 response = await func(request)
279bec116ed3 ^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/fastapi/routing.py", line 237, in app
279bec116ed3 raw_response = await run_endpoint_function(
279bec116ed3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
279bec116ed3 return await [dependant.call](https://dependant.call/)(**values)
279bec116ed3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/usr/src/app/main.py", line 75, in predict
279bec116ed3 model = await load(await app.state.model_cache.get(model_name, model_type, **kwargs))
279bec116ed3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/usr/src/app/main.py", line 101, in load
279bec116ed3 await loop.run_in_executor(app.state.thread_pool, _load)
279bec116ed3 File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
279bec116ed3 result = self.fn(*self.args, **self.kwargs)
279bec116ed3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/usr/src/app/main.py", line 94, in _load
279bec116ed3 model.load()
279bec116ed3 File "/usr/src/app/models/base.py", line 63, in load
279bec116ed3 [self.download](https://self.download/)()
279bec116ed3 File "/usr/src/app/models/base.py", line 58, in download
279bec116ed3 self._download()
279bec116ed3 File "/usr/src/app/models/clip.py", line 51, in _download
279bec116ed3 self._download_model(*models[0])
279bec116ed3 File "/usr/src/app/models/clip.py", line 123, in _download_model
279bec116ed3 download_model(
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/clip_server/model/pretrained_models.py", line 239, in download_model
279bec116ed3 raise RuntimeError(
279bec116ed3 RuntimeError: Failed to download https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d53657276696365/onnx/ViT-B-32/textual.onnx within retry limit 3
279bec116ed3 [09/14/23 00:44:39] INFO Downloading clip model 'ViT-B-32::openai'.This may
279bec116ed3 take a while.
1058d5367490 I20230914 00:44:41.235440 354 batched_indexer.cpp:284] Running GC for aborted requests, req map size: 0
1058d5367490 I20230914 00:44:41.264710 353 raft_server.cpp:546] Term: 8, last_index index: 53064, committed_index: 53064, known_applied_index: 53064, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 848797
1058d5367490 I20230914 00:44:41.264742 442 raft_server.h:60] Peer refresh succeeded!
279bec116ed3 Failed to download
279bec116ed3 https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d536572
279bec116ed3 76696365/onnx/ViT-B-32/textual.onnx with <HTTPError 416: 'Requested Range Not
279bec116ed3 Satisfiable'> at the 0th attempt
279bec116ed3 Failed to download
279bec116ed3 https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d536572
279bec116ed3 76696365/onnx/ViT-B-32/textual.onnx with <HTTPError 416: 'Requested Range Not
279bec116ed3 Satisfiable'> at the 1th attempt

The OS that Immich Server is running on

Docker

Version of Immich Server

v1.78.0

Version of Immich Mobile App

v1.78.0

Platform with the issue

[X] Server
[ ] Web
[ ] Mobile

Your docker-compose.yml content

version: "3.8"

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "immich" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - ${PHOTOPRISM_LOCATION}:/photoprism:ro
    env_file:
      - .env
    depends_on:
      - redis
#      - database
      - typesense
    restart: always

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    extends:
      file: hwaccel.yml
      service: hwaccel
    command: [ "start.sh", "microservices" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - ${PHOTOPRISM_LOCATION}:/photoprism:ro
    env_file:
      - .env
    depends_on:
      - redis
#      - database
      - typesense
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    volumes:
      - ${MODEL_CACHE_LOCATION}:/cache
    env_file:
      - .env
    restart: always

  immich-web:
    container_name: immich_web
    image: ghcr.io/immich-app/immich-web:${IMMICH_VERSION:-release}
    env_file:
      - .env
    restart: always

  typesense:
    container_name: immich_typesense
    image: typesense/typesense:0.24.1@sha256:9bcff2b829f12074426ca044b56160ca9d777a0c488303469143dd9f8259d4dd
    environment:
      - TYPESENSE_API_KEY=${TYPESENSE_API_KEY}
      - TYPESENSE_DATA_DIR=/data
    volumes:
      - ${TYPESENSE_LOCATION}:/data
    restart: always

  redis:
    container_name: immich_redis
    image: redis:6.2-alpine@sha256:70a7a5b641117670beae0d80658430853896b5ef269ccf00d1827427e3263fa3
    restart: always

#  database:
#    container_name: immich_postgres
#    image: postgres:14-alpine@sha256:28407a9961e76f2d285dc6991e8e48893503cc3836a4755bbc2d40bcc272a441
#    env_file:
#      - .env
#    environment:
#      POSTGRES_PASSWORD: ${DB_PASSWORD}
#      POSTGRES_USER: ${DB_USERNAME}
#      POSTGRES_DB: ${DB_DATABASE_NAME}
#    volumes:
#      - ${PG_LOCATION}:/var/lib/postgresql/data
#    restart: always

  immich-proxy:
    container_name: immich_proxy
    image: ghcr.io/immich-app/immich-proxy:${IMMICH_VERSION:-release}
    environment:
      # Make sure these values get passed through from the env file
      - IMMICH_SERVER_URL
      - IMMICH_WEB_URL
    ports:
      - 2283:8080
    depends_on:
      - immich-server
      - immich-web
    restart: always

volumes:
  pgdata:
  model-cache:
  tsdata:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=/mnt/array0/immich/uploads
PG_LOCATION=/mnt/array0/immich/postgres
MODEL_CACHE_LOCATION=/mnt/array0/immich/model-cache
TYPESENSE_LOCATION=/mnt/array0/immich/typesense

PHOTOPRISM_LOCATION=/mnt/array0/photoprism/originals

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release

# Connection secrets for postgres and typesense. You should change these to random passwords
TYPESENSE_API_KEY=blah-bliggedy-blah
DB_PASSWORD=********

# The values below this line do not need to be changed
###################################################################################
DB_HOSTNAME=192.168.1.128
DB_USERNAME=immich
DB_DATABASE_NAME=immich

REDIS_HOSTNAME=immich_redis

Reproduction steps

1.search for anything in immich

Additional information

A search in the UI will trigger a download of https://github.com/jina-ai/clip-as-service/blob/main/server/clip_server/model/clip_onnx.py which will fail almost immediately

uniform641 commented 12 months ago

I've uploaded my local default models for clip, facial-recognition and image-classification to Google Drive, you can download it from here After extracting the zip file, you will need to copy these files to the location of your model-cache volume, can typically be found in /var/lib/docker/volumes/<volume-name>/_data Or you can find that information with
docker volume inspect <model-cache-volume-name>
It seems that the download link is down. Due to network issue I have to download every model manually. But I don't know the file structure of model-cache foldr and naming rule of model in the folder. Would anyone offer a file structure of model-cache folder? I would appreciate it very much.
If you are having network issues while downloading I would recommend you to use a free VPN like Proton to bypass the limit temporarily

If that not work for you I can send you the file structure later...

Thanks for your advice. To permanently solve the problem I managed to build a tproxy on the server.

acios commented 12 months ago

i checked my log and I'm having trouble downloading all the models needed by machine learning

[11/18/23 11:03:16] INFO Initialized request thread pool with 8 threads.
[11/18/23 11:03:16] INFO Downloading facial recognition model
'buffalo_l'.This may take a while.
[11/18/23 11:05:27] INFO Downloading facial recognition model
'buffalo_l'.This may take a while.
[11/18/23 11:05:27] WARNING Failed to load facial-recognition model
'buffalo_l'.Clearing cache and retrying.
[11/18/23 11:05:27] WARNING Attempted to clear cache for model 'buffalo_l' but cache directory does not exist.
[11/18/23 11:07:38] INFO Downloading clip model 'ViT-B-32openai'.This may take a while.
[11/18/23 11:07:38] WARNING Failed to load facial-recognition model
'buffalo_l'.Clearing cache and retrying.
[11/18/23 11:07:38] WARNING Attempted to clear cache for model 'buffalo_l' but cache directory does not exist.
[11/18/23 11:09:49] INFO Downloading image classification model
'microsoft/resnet-50'.This may take a while.
[11/18/23 11:09:49] WARNING Failed to load clip model
'ViT-B-32openai'.Clearing cache and retrying.
[11/18/23 11:09:49] INFO Cleared cache directory for model
'ViT-B-32__openai'.

I don't know if it is a connection issue or what, seems like the program failed to even create the folders to save those files, including not only buffalo_l but also the ones mentioned in the above comments. I thought manually put those files in the cache folder might help but did not work, probably because I put them in the wrong place?

aviv926 commented 11 months ago

i checked my log and I'm having trouble downloading all the models needed by machine learning

[11/18/23 11:03:16] INFO Initialized request thread pool with 8 threads. [11/18/23 11:03:16] INFO Downloading facial recognition model 'buffalo_l'.This may take a while. [11/18/23 11:05:27] INFO Downloading facial recognition model 'buffalo_l'.This may take a while. [11/18/23 11:05:27] WARNING Failed to load facial-recognition model 'buffalo_l'.Clearing cache and retrying. [11/18/23 11:05:27] WARNING Attempted to clear cache for model 'buffalo_l' but cache directory does not exist. [11/18/23 11:07:38] INFO Downloading clip model 'ViT-B-32openai'.This may take a while. [11/18/23 11:07:38] WARNING Failed to load facial-recognition model 'buffalo_l'.Clearing cache and retrying. [11/18/23 11:07:38] WARNING Attempted to clear cache for model 'buffalo_l' but cache directory does not exist. [11/18/23 11:09:49] INFO Downloading image classification model 'microsoft/resnet-50'.This may take a while. [11/18/23 11:09:49] WARNING Failed to load clip model 'ViT-B-32openai'.Clearing cache and retrying. [11/18/23 11:09:49] INFO Cleared cache directory for model 'ViT-B-32__openai'.

I don't know if it is a connection issue or what, seems like the program failed to even create the folders to save those files, including not only buffalo_l but also the ones mentioned in the above comments. I thought manually put those files in the cache folder might help but did not work, probably because I put them in the wrong place?

What happens in terms of permissions? Do you have permissions to access the folder? Try to give more details about your system and the YML file

acios commented 11 months ago

'microsoft/resnet-50'

I don't know how to check permissions under docker, I opened a new issue for details of yml files, please check:

https://github.com/immich-app/immich/issues/5134

thanks for the help, I'm new to linux and still learning

immich-app / immich