immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
42.58k stars 2.08k forks source link

[BUG] Internal server error 500 when clicking "Explore" #2035

Closed zyg117 closed 1 year ago

zyg117 commented 1 year ago

The bug

When clicking "Explore" on the web, the server generates an internal error 500.

immich-server shows this log:

[Nest] 1 - 03/21/2023, 8:36:19 AM ERROR [ExceptionsHandler] Request failed with HTTP code 404 | Server said: Not Found ObjectNotFound: Request failed with HTTP code 404 | Server said: Not Found at ObjectNotFound.TypesenseError [as constructor] (/usr/src/app/node_modules/typesense/lib/Typesense/Errors/TypesenseError.js:23:28) at new ObjectNotFound (/usr/src/app/node_modules/typesense/lib/Typesense/Errors/ObjectNotFound.js:25:42) at ApiCall.customErrorForResponse (/usr/src/app/node_modules/typesense/lib/Typesense/ApiCall.js:338:21) at /usr/src/app/node_modules/typesense/lib/Typesense/ApiCall.js:199:98 at step (/usr/src/app/node_modules/typesense/lib/Typesense/ApiCall.js:33:23) at Object.next (/usr/src/app/node_modules/typesense/lib/Typesense/ApiCall.js:14:53) at step (/usr/src/app/node_modules/typesense/lib/Typesense/ApiCall.js:18:139) at Object.next (/usr/src/app/node_modules/typesense/lib/Typesense/ApiCall.js:14:53) at fulfilled (/usr/src/app/node_modules/typesense/lib/Typesense/ApiCall.js:5:58) at processTicksAndRejections (node:internal/process/task_queues:96:5)

image

The OS that Immich Server is running on

Unraid 6.11.5

Version of Immich Server

v1.51.1

Version of Immich Mobile App

v1.50.0

Platform with the issue

Your docker-compose.yml content

standard, like the official

Your .env content

standard, like the official

Reproduction steps

Updated the containers to v1.51.1
Clicked "Explore" in the web interface.

Additional information

No response

JosiahBull commented 1 year ago

May be a duplicate, see #2028

zyg117 commented 1 year ago

May be a duplicate, see #2028

I have checked it already and it's different from that question.

alextran1502 commented 1 year ago

@zyg117 Can you add LOG_LEVEL=debug in your Immich environment variable file, then restart the containers and try to access the explore page again then help us capture a screenshot of the log

alextran1502 commented 1 year ago

@florianchevallier can you restart the containers, it maybe the machine learning is not start up properly

alextran1502 commented 1 year ago

@kai23 I think your container is not up-to-date

jrasm91 commented 1 year ago

@florianchevallier you should open a separate ticket (or post on discord for faster replies) as you are running into an unrelated issue with your setup.

kai23 commented 1 year ago

On my side, it was indeed because I was still using the old repository for the docker images, after using the new ones (from ghcr.io), the 404 went away

theautomation commented 1 year ago

I have this same issue when using the search bar.

immich-server log:

[Nest] 1  - 03/21/2023, 8:32:56 PM   ERROR [ExceptionsHandler] Request failed with status code 500
Error: Request failed with status code 500
    at createError (/usr/src/app/node_modules/axios/lib/core/createError.js:16:15)
    at settle (/usr/src/app/node_modules/axios/lib/core/settle.js:17:12)
    at IncomingMessage.handleStreamEnd (/usr/src/app/node_modules/axios/lib/adapters/http.js:322:11)
    at IncomingMessage.emit (node:events:539:35)
    at endReadableNT (node:internal/streams/readable:1345:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21)

immich-machine-learning log:

[2023-03-21 20:32:56,276] ERROR in app: Exception on /sentence-transformer/encode-text [POST]
Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/venv/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/venv/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/venv/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/usr/src/app/src/main.py", line 53, in clip_encode_text
    model = _get_model(clip_text_model)
  File "/usr/src/app/src/main.py", line 24, in _get_model
    _model_cache[key] = SentenceTransformer(model)
  File "/opt/venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 87, in __init__
    snapshot_download(model_name_or_path,
  File "/opt/venv/lib/python3.10/site-packages/sentence_transformers/util.py", line 476, in snapshot_download
    os.makedirs(nested_dirname, exist_ok=True)
  File "/usr/local/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/local/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/local/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/local/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/.cache'
10.42.1.245 - - [21/Mar/2023 20:32:56] "POST /sentence-transformer/encode-text HTTP/1.1" 500 -
twitsforbrains commented 1 year ago

@theautomation I had the same error. As you can see it says permission denied on /.cache. I had passed in a uid into this container just cause I'm paranoid. Removing the uid and letting it be root fixes the problem. The library sentence-transformers used by immich appears to try to create this folder. Unfortunately even when I tried creating this folder in the container with the right permissions I was getting a different error, a 405 error when trying to access https://huggingface.co/api/models/sentence-transformers/clip-ViT-B-32. For simplicity I just removed the user / uid / gid from my docker compose for this container. Hope that was your issue and hope this helps

zyg117 commented 1 year ago

@zyg117 Can you add LOG_LEVEL=debug in your Immich environment variable file, then restart the containers and try to access the explore page again then help us capture a screenshot of the log

Where should I go to set up? Is it this configuration file : image image image

If it's here, I've already added it and restarted as you instructed, but when I accessed that page again, I found that the log output didn't change. Did I set something wrong?

Here is the error log output:

image

And the full log output: image image image image

w00tlarr commented 1 year ago

Hi - I got these errors but resolved it. Here are the steps for me: 1) Make sure your docker-compose is pulling all images from ghcr.io repo (i.e.: ghcr.io/immich-app/immich-server:release) instead from altran1502.

2) Have to start containers in order - after docker-compose -d up (tho I use Portainer), I manually shutdown all containers and restart one by one in this order, immich-db, redis, typesense, then wait up to 5 mins before immich-machine-learning, immich-server, immich-microservice. The connection between typesense and immich-server is pretty tight (per release notes), so make sure typesense is up before the server is. [I did comment out the logging: driver: none section of typesense docker-compose to monitor in the docker logs.] This part is a lot of trials and errors for me to get all containers to run healthily - if someone can suggest a docker-compose method to check for dependencies + health checks (not just container starts), please suggest one. Thanks in advance.

3) Once up, go to the Immich Web UI > Administration > Encode Clip > All (button). Mine ran overnight (~over 100K photos/videos). Trying to search before this is done will give the 500 errors (404 in immich-server logs). At the end, you'll need to have patience for this for large libraries.

4) Note: I am getting these errors still though. But my search works now. Immich-Microservice logs: [Nest] 1 - 03/22/2023, 8:58:30 AM ERROR [SmartInfoService] Unable run clip encoding pipeline: 795e0760-3aa3-483c-a52b-70dbbe5e1e02 QueryFailedError: numeric field overflow at PostgresQueryRunner.query (/usr/src/app/node_modules/typeorm/driver/postgres/PostgresQueryRunner.js:211:19) at runMicrotasks () at processTicksAndRejections (node:internal/process/task_queues:96:5) at async InsertQueryBuilder.execute (/usr/src/app/node_modules/typeorm/query-builder/InsertQueryBuilder.js:106:33) at async SmartInfoRepository.upsert (/usr/src/app/dist/apps/microservices/libs/infra/src/db/repository/smart-info.repository.js:25:9) at async SmartInfoService.handleEncodeClip (/usr/src/app/dist/apps/microservices/libs/domain/src/smart-info/smart-info.service.js:97:13) at async ClipEncodingProcessor.onEncodeClip (/usr/src/app/dist/apps/microservices/apps/microservices/src/processors.js:112:9) --
Immich-Postgres 2023-03-22 12:58:30.591 UTC [472] ERROR: numeric field overflow 2023-03-22 12:58:30.591 UTC [472] DETAIL: A field with precision 20, scale 19 must round to an absolute value less than 10^1. 2023-03-22 12:58:30.591 UTC [472] STATEMENT: INSERT INTO "smart_info"("assetId", "tags", "objects", "clipEmbedding") VALUES ($1, DEFAULT, DEFAULT, $2) ON CONFLICT ( "assetId" ) DO UPDATE SET "assetId" = EXCLUDED."assetId", "clipEmbedding" = EXCLUDED."clipEmbedding"

Lastly THANK YOU! for such a wonderful feature. Loving it so far! Just growing pains. ;-)

jrasm91 commented 1 year ago

Thanks!

We're working to make the start order/logic between typesense and the other containers more flexible.

The overflow has shown up a few times. If you could grab the thumbnail for that asset, I would be interested to know if it is a valid image. If you could find and upload it, we can troubleshoot that issue. Ultimately it means that one won't show up in clip searches, but everything else should continue to work as expected.

theautomation commented 1 year ago

@theautomation I had the same error. As you can see it says permission denied on /.cache. I had passed in a uid into this container just cause I'm paranoid. Removing the uid and letting it be root fixes the problem. The library sentence-transformers used by immich appears to try to create this folder. Unfortunately even when I tried creating this folder in the container with the right permissions I was getting a different error, a 405 error when trying to access https://huggingface.co/api/models/sentence-transformers/clip-ViT-B-32. For simplicity I just removed the user / uid / gid from my docker compose for this container. Hope that was your issue and hope this helps

Thanks for this tip :). I have removed the UID en GID and yes the search is working then, however this is a workaround in the wrong direction I would like to keep using the UID and GID because of security reasons but most of all the user an group of those ID's owns the NFS share I have mounted into the container. So searching is working with this workaround but now I am seeing errors when I start an "ENCODE CLIP" job.

[2023-03-23 20:04:41,843] ERROR in app: Exception on /image-classifier/tag-image [POST]
Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/venv/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/venv/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/venv/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/usr/src/app/src/main.py", line 43, in image_classification
    return run_engine(model, assetPath), 200
  File "/usr/src/app/src/main.py", line 59, in run_engine
    predictions = engine(path)
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/image_classification.py", line 100, in __call__
    return super().__call__(images, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1084, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1090, in run_single
    model_inputs = self.preprocess(inputs, **preprocess_params)
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/image_classification.py", line 103, in preprocess
    image = load_image(image)
  File "/opt/venv/lib/python3.10/site-packages/transformers/image_utils.py", line 232, in load_image
    raise ValueError(
ValueError: Incorrect path or url, URLs must start with `http://` or `https://`, and upload/5c86787b-cae9-4382-b7ca-cc0075d77b96/thumb/CLI/b8e4adf4-76ee-41ac-8379-e921c6182229.jpeg is not a valid path

Now I don't know if the cause of this error is due to UID GID being removed or some other reason. btw I am running this in Kubernetes This is my deployment yaml for the machine learning container/pod

jagjordi commented 1 year ago

I am also seeing this error when running encode clip or tag objects:

172.25.0.7 - - [24/Mar/2023:15:33:03 +0000] "POST /image-classifier/tag-image HTTP/1.1" 500 265 "-" "axios/0.26.1"
[2023-03-24 15:33:03,567] ERROR in app: Exception on /object-detection/detect-object [POST]
Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/venv/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/venv/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/venv/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/usr/src/app/src/main.py", line 37, in object_detection
    return run_engine(model, assetPath), 200
  File "/usr/src/app/src/main.py", line 59, in run_engine
    predictions = engine(path)
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/object_detection.py", line 95, in __call__
    return super().__call__(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1084, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1090, in run_single
    model_inputs = self.preprocess(inputs, **preprocess_params)
  File "/opt/venv/lib/python3.10/site-packages/transformers/pipelines/object_detection.py", line 98, in preprocess
    image = load_image(image)
  File "/opt/venv/lib/python3.10/site-packages/transformers/image_utils.py", line 232, in load_image
    raise ValueError(
ValueError: Incorrect path or url, URLs must start with `http://` or `https://`, and upload/a8e843dc-2b16-4175-b247-8ee5539f08ab/thumb/CLI/da09f667-6bd8-41a3-90a8-093b2306b5bc.jpeg is not a valid path
jrasm91 commented 1 year ago

@jagjordi - can you verify that the UPLOAD_LOCATION volume is mounted to the machine learning container?

jagjordi commented 1 year ago

@jrasm91 It looks like it is mounted:

root@Tower:~# docker exec -it immich_machine_learning bash
root@3fb4fa0cb6be:/usr/src/app# ls
Dockerfile  README.md  gunicorn.conf.py  src  upload
root@3fb4fa0cb6be:/usr/src/app# cd upload/
root@3fb4fa0cb6be:/usr/src/app/upload# ls
1dd38e30-8f47-4129-bcd0-36e6537dfa54  5001418f-1e48-4a12-81c9-10ff90defb78
21500b77-b342-4707-acf0-b6f1daa55577  9d015ed5-e62c-4dac-b6f6-3f5f25c73e00
root@3fb4fa0cb6be:/usr/src/app/upload# cd 9d015ed5-e62c-4dac-b6f6-3f5f25c73e00/
root@3fb4fa0cb6be:/usr/src/app/upload/9d015ed5-e62c-4dac-b6f6-3f5f25c73e00# ls
1980  2007  2010  2013  2016  2019  2022           original
2003  2008  2011  2014  2017  2020  2023           profile
2004  2009  2012  2015  2018  2021  encoded-video  thumb
root@3fb4fa0cb6be:/usr/src/app/upload/9d015ed5-e62c-4dac-b6f6-3f5f25c73e00# 

docker_compose.yml (ML section)

  immich-machine-learning:                                                                                                                                                                                                      
    container_name: immich_machine_learning                                                                                                                                                                                     
    image: ghcr.io/immich-app/immich-machine-learning:release                                                                                                                                                                   
    volumes:                                                                                                                                                                                                                    
      - ${UPLOAD_LOCATION}:/usr/src/app/upload                                                                                                                                                                                  
      - model-cache:/cache
    env_file:
      - .env
    environment:
      - NODE_ENV=production
    restart: always

It seems to be correctly mounted into /usr/src/app/upload

alextran1502 commented 1 year ago

This would happen if the system has non-avx CPU and try Typesense search, the workaround is to disable typesense