immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
51.09k stars 2.7k forks source link

[BUG] Some photos are stuck in the upload folder after bulk upload #3801

Closed MMauro94 closed 1 year ago

MMauro94 commented 1 year ago

The bug

I've just installed a fresh instance of Immich on my server and tried to import 875 assets for the year 2022 via the CLI (from the immich-server container).

Everything went OK, but I still see three images stuck in the upload folder:

root@7e32564415e3:/usr/src/app# ls -la /usr/src/app/upload/upload/af0578da-7a71-41d7-a9d8-1726c777b98b/
total 11788
drwxr-xr-x 2 root root    4096 Aug 20 01:29 .
drwxr-xr-x 3 root root    4096 Aug 17 20:20 ..
-rw-r--r-- 1 root root 3542874 Aug 20 01:21 1033466c-5ffd-44d2-a919-487cf9817502.jpg
-rw-r--r-- 1 root root 4238805 Aug 20 01:21 22a5450c-046b-4f87-a8ae-6ad7523ce9d2.jpg
-rw-r--r-- 1 root root 4279550 Aug 20 01:21 d304fd6f-dc12-453d-b76f-d2993fdcaba1.jpg

If I go and find the photos in the web UI they appear with a broken thumbnal. If I try to download them this also fails as the server expects them to be in a different location:

[Nest] 7  - 08/20/2023, 11:16:22 AM   ERROR [ExceptionsHandler] ENOENT: no such file or directory, stat 'upload/library/mmauro/2022/02 - February/2022-02-05 20.35.11.jpg'
Containers logs #### Microservices container ``` [Nest] 7 - 08/20/2023, 1:21:19 AM WARN [StorageTemplateService] Unable to save new originalPath to database, undoing move for path upload/upload/af0578da-7a71-41d7-a9d8-1726c777b98b/22a5450c-046b-4f87-a8ae-6ad7523ce9d2.jpg - filename 2022-05-13 11.30.16 - id 61ba3dac-f8a8-43db-b841-5c55dd6f17b7 [Nest] 7 - 08/20/2023, 1:21:19 AM WARN [StorageTemplateService] QueryFailedError: duplicate key value violates unique constraint "UQ_4ed4f8052685ff5b1e7ca1058ba" at PostgresQueryRunner.query (/usr/src/app/node_modules/typeorm/driver/postgres/PostgresQueryRunner.js:211:19) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async UpdateQueryBuilder.execute (/usr/src/app/node_modules/typeorm/query-builder/UpdateQueryBuilder.js:83:33) at async updateSubject (/usr/src/app/node_modules/typeorm/persistence/SubjectExecutor.js:380:38) at async Promise.all (index 0) at async SubjectExecutor.executeUpdateOperations (/usr/src/app/node_modules/typeorm/persistence/SubjectExecutor.js:422:9) at async SubjectExecutor.execute (/usr/src/app/node_modules/typeorm/persistence/SubjectExecutor.js:99:9) at async EntityPersistExecutor.execute (/usr/src/app/node_modules/typeorm/persistence/EntityPersistExecutor.js:140:21) at async AssetRepository.save (/usr/src/app/dist/infra/repositories/asset.repository.js:116:24) at async StorageTemplateService.moveAsset (/usr/src/app/dist/domain/storage-template/storage-template.service.js:142:21) ``` ``` [Nest] 7 - 08/20/2023, 1:21:19 AM ERROR [JobService] Unable to run job handler: Error: Input file is missing: upload/library/mmauro/2022/05 - May/2022-05-13 11.30.16.jpg [Nest] 7 - 08/20/2023, 1:21:19 AM ERROR [JobService] Error: Input file is missing: upload/library/mmauro/2022/05 - May/2022-05-13 11.30.16.jpg [Nest] 7 - 08/20/2023, 1:21:19 AM ERROR [JobService] Object: { "id": "eb8f9d5f-faa1-4aa5-b89e-f7f6d635aec6", "source": "upload" } ``` #### Postgres container ``` 2023-08-20 01:21:05.905 UTC [1488] ERROR: duplicate key value violates unique constraint "UQ_4ed4f8052685ff5b1e7ca1058ba" 2023-08-20 01:21:05.905 UTC [1488] DETAIL: Key ("originalPath")=(upload/library/mmauro/2022/02 - February/2022-02-05 20.35.11.jpg) already exists. 2023-08-20 01:21:05.905 UTC [1488] STATEMENT: UPDATE "assets" SET "originalPath" = $1, "updatedAt" = CURRENT_TIMESTAMP WHERE "id" IN ($2) RETURNING "updatedAt" 2023-08-20 01:21:19.182 UTC [1488] ERROR: duplicate key value violates unique constraint "UQ_4ed4f8052685ff5b1e7ca1058ba" 2023-08-20 01:21:19.182 UTC [1488] DETAIL: Key ("originalPath")=(upload/library/mmauro/2022/05 - May/2022-05-13 11.30.16.jpg) already exists. 2023-08-20 01:21:19.182 UTC [1488] STATEMENT: UPDATE "assets" SET "originalPath" = $1, "updatedAt" = CURRENT_TIMESTAMP WHERE "id" IN ($2) RETURNING "updatedAt" 2023-08-20 01:21:22.950 UTC [1489] ERROR: duplicate key value violates unique constraint "UQ_4ed4f8052685ff5b1e7ca1058ba" 2023-08-20 01:21:22.950 UTC [1489] DETAIL: Key ("originalPath")=(upload/library/mmauro/2022/05 - May/2022-05-13 14.16.24.jpg) already exists. 2023-08-20 01:21:22.950 UTC [1489] STATEMENT: UPDATE "assets" SET "originalPath" = $1, "updatedAt" = CURRENT_TIMESTAMP WHERE "id" IN ($2) RETURNING "updatedAt" ```

After snooping in the DB, I've found out that these rows are indeed there. Sadly the original path is invalid, as those photos are still stuck in the upload folder.

Show table | id | deviceAssetId | ownerId | deviceId | type | originalPath | resizePath | fileCreatedAt | fileModifiedAt | isFavorite | duration | webpPath | encodedVideoPath | checksum | isVisible | livePhotoVideoId | updatedAt | createdAt | isArchived | originalFileName | sidecarPath | isReadOnly | thumbhash | |--------------------------------------|----------------------------------|--------------------------------------|----------|-------|------------------------------------------------------------------|------------|------------------------|------------------------|------------|------------------|----------|------------------|--------------------------------------------|-----------|------------------|-------------------------------|-------------------------------|------------|-----------------------|-------------|------------|-----------| | 98b7d954-6779-47bf-a689-6decd026cb2d | 2022-05-1314.16.24_1.jpg-4254619 | af0578da-7a71-41d7-a9d8-1726c777b98b | CLI | IMAGE | upload/library/mmauro/2022/05 - May/2022-05-13 14.16.24.jpg | | 2022-05-13 12:16:24+00 | 2022-05-13 12:16:26+00 | f | "0:00:00.000000" | | | \xd7cf62b4ad113d903c40130c7f8099ce5d166bd2 | t | | 2023-08-20 01:21:22.883001+00 | 2023-08-20 01:21:22.740879+00 | f | 2022-05-13 14.16.24_1 | | f | | | eb8f9d5f-faa1-4aa5-b89e-f7f6d635aec6 | 2022-05-1311.30.16_1.jpg-4263447 | af0578da-7a71-41d7-a9d8-1726c777b98b | CLI | IMAGE | upload/library/mmauro/2022/05 - May/2022-05-13 11.30.16.jpg | | 2022-05-13 09:30:16+00 | 2022-05-13 09:30:22+00 | f | "0:00:00.000000" | | | \xbe3e8fd691789a0e99540e42fda712810701c980 | t | | 2023-08-20 01:21:19.108137+00 | 2023-08-20 01:21:18.924031+00 | f | 2022-05-13 11.30.16_1 | | f | | | 6a3b1e8c-65f5-450a-af9d-3aef4c9ed0f6 | 2022-02-0520.35.11.jpg-2559987 | af0578da-7a71-41d7-a9d8-1726c777b98b | CLI | IMAGE | upload/library/mmauro/2022/02 - February/2022-02-05 20.35.11.jpg | | 2022-02-05 19:35:11+00 | 2022-02-05 19:35:14+00 | f | "0:00:00.000000" | | | \x09427201330a11a09a04bfc66387be1f95640f21 | t | | 2023-08-20 01:21:05.809499+00 | 2023-08-20 01:21:05.502502+00 | f | 2022-02-05 20.35.11 | | f | |

But I also noticed - if you check the table above - that the IDs of the images don't match. So I digged a little deeper, and by running this query:

SELECT * FROM "assets" WHERE "originalPath" LIKE '%upload/upload%'

I found this out:

Show table | id | deviceAssetId | ownerId | deviceId | type | originalPath | resizePath | fileCreatedAt | fileModifiedAt | isFavorite | duration | webpPath | encodedVideoPath | checksum | isVisible | livePhotoVideoId | updatedAt | createdAt | isArchived | originalFileName | sidecarPath | isReadOnly | thumbhash | |--------------------------------------|----------------------------------|--------------------------------------|----------|-------|---------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|------------------------|------------------------|------------|------------------|----------------------------------------------------------------------------------------------|------------------|--------------------------------------------|-----------|------------------|-------------------------------|-------------------------------|------------|-----------------------|-------------|------------|----------------------------------------------| | 61ba3dac-f8a8-43db-b841-5c55dd6f17b7 | 2022-05-1311.30.16.jpg-4238805 | af0578da-7a71-41d7-a9d8-1726c777b98b | CLI | IMAGE | upload/upload/af0578da-7a71-41d7-a9d8-1726c777b98b/22a5450c-046b-4f87-a8ae-6ad7523ce9d2.jpg | upload/thumbs/af0578da-7a71-41d7-a9d8-1726c777b98b/61ba3dac-f8a8-43db-b841-5c55dd6f17b7.jpeg | 2022-05-13 09:30:16+00 | 2022-05-13 09:30:24+00 | f | "0:00:00.000000" | upload/thumbs/af0578da-7a71-41d7-a9d8-1726c777b98b/61ba3dac-f8a8-43db-b841-5c55dd6f17b7.webp | | \x7015a4a3a33c238815e1bdcd840387b6369d75ac | t | | 2023-08-20 01:21:20.064026+00 | 2023-08-20 01:21:18.926542+00 | f | 2022-05-13 11.30.16 | | f | \x5ee70915846057962f7986ab99688b867feaf3984f | | 2594ba72-341b-41c5-ad96-77f203d27839 | 2022-02-0520.35.11_1.jpg-3542874 | af0578da-7a71-41d7-a9d8-1726c777b98b | CLI | IMAGE | upload/upload/af0578da-7a71-41d7-a9d8-1726c777b98b/1033466c-5ffd-44d2-a919-487cf9817502.jpg | upload/thumbs/af0578da-7a71-41d7-a9d8-1726c777b98b/2594ba72-341b-41c5-ad96-77f203d27839.jpeg | 2022-02-05 19:35:11+00 | 2022-02-05 19:35:15+00 | f | "0:00:00.000000" | upload/thumbs/af0578da-7a71-41d7-a9d8-1726c777b98b/2594ba72-341b-41c5-ad96-77f203d27839.webp | | \x8b52c075608aafe96ba14f7883dd553dfab682b0 | t | | 2023-08-20 01:21:07.127282+00 | 2023-08-20 01:21:05.522287+00 | f | 2022-02-05 20.35.11_1 | | f | \x56080a0d00a76c9ac095336b666749387f3ae46610 | | 7f9068ec-4c29-4841-a111-7564de7d7701 | 2022-05-1314.16.24.jpg-4279550 | af0578da-7a71-41d7-a9d8-1726c777b98b | CLI | IMAGE | upload/upload/af0578da-7a71-41d7-a9d8-1726c777b98b/d304fd6f-dc12-453d-b76f-d2993fdcaba1.jpg | upload/thumbs/af0578da-7a71-41d7-a9d8-1726c777b98b/7f9068ec-4c29-4841-a111-7564de7d7701.jpeg | 2022-05-13 12:16:24+00 | 2022-05-13 12:16:28+00 | f | "0:00:00.000000" | upload/thumbs/af0578da-7a71-41d7-a9d8-1726c777b98b/7f9068ec-4c29-4841-a111-7564de7d7701.webp | | \xd41302cb7aa6ad0239f90d7af11028982209b667 | t | | 2023-08-20 01:21:26.340961+00 | 2023-08-20 01:21:22.754042+00 | f | 2022-05-13 14.16.24 | | f | \xdcf7050d82dbb5208966869b8885ba98597f877046 |

These are the photos that are stuck on the upload folder, which actually show up fine in the UI. The actual images which are broken are the one from the table above, which are nowhere to be found in the immich folders.

I've later noticed that these photos all have been taken very close to one another: if you notice there are 3 pairs of photos, where each pair has been shot in the same exact second.

My storage template is:

{{y}}/{{MM}} - {{MMMM}}/{{y}}-{{MM}}-{{dd}} {{HH}}.{{mm}}.{{ss}}

What I think is happening is that the 3 photos in the upload folder cannot be moved because a file with the same time doesn't exist (so +1 is not added), while the DB thinks it's there - hence the duplicate key failure.

What I find more worrying though, is that it appears 3 photos have disappeared into thin air. I can confirm this with the following calculations:

I haven't been able to look at the code to understand exactly what's happening, but I hope that all this info brings you closer to catch this sneaky bug. (I see some people had similar reports, but I preferred to open this issue as I had a bunch of info to give).

Let me know if there's some more information I can give you that may help you.

Cheers :)

The OS that Immich Server is running on

Debian 12.0

Version of Immich Server

v1.74.0

Version of Immich Mobile App

N/A

Platform with the issue

Your docker-compose.yml content

version: "3.8"

services:
  immich-server:
    container_name: immich-server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "immich" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - "/media/pool/photos_to_import_1:/media/import1:ro"
      - "/media/pool/photos_to_import_2:/media/import2:ro"
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always
    labels:
      - "docker-volume-backup.stop-during-backup=immich"

  immich-microservices:
    container_name: immich-microservices
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "microservices" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always
    labels:
      - "docker-volume-backup.stop-during-backup=immich"

  immich-machine-learning:
    container_name: immich-machine-learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always
    labels:
      - "docker-volume-backup.stop-during-backup=immich"

  immich-web:
    container_name: immich-web
    image: ghcr.io/immich-app/immich-web:${IMMICH_VERSION:-release}
    env_file:
      - .env
    restart: always
    labels:
      - "docker-volume-backup.stop-during-backup=immich"

  typesense:
    container_name: immich-typesense
    image: typesense/typesense:0.24.1
    environment:
      - TYPESENSE_API_KEY=${TYPESENSE_API_KEY}
      - TYPESENSE_DATA_DIR=/data
    volumes:
      - tsdata:/data
    restart: always
    labels:
      - "docker-volume-backup.stop-during-backup=immich"

  redis:
    container_name: immich-redis
    image: redis:6.2-alpine
    restart: always
    labels:
      - "docker-volume-backup.stop-during-backup=immich"

  database:
    container_name: immich-postgres
    image: postgres:14-alpine
    env_file:
      - .env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: always
    labels:
      - "docker-volume-backup.stop-during-backup=immich"

  immich-proxy:
    container_name: immich-proxy
    image: ghcr.io/immich-app/immich-proxy:${IMMICH_VERSION:-release}
    environment:
      # Make sure these values get passed through from the env file
      - IMMICH_SERVER_URL
      - IMMICH_WEB_URL
    depends_on:
      - immich-server
      - immich-web
    restart: always
    labels:
      - homepage.group=Home
      - homepage.name=Immich
      - homepage.icon=immich
      - homepage.href=https://photos.mmauro.page
      - homepage.weight=3
      - homepage.description=Photos management
      - homepage.widget.type=immich
      - homepage.widget.url=http://immich-proxy:8080
      - homepage.widget.key=$IMMICH_API_KEY
      - homepage.widget.fields=["photos","videos"]
      - "docker-volume-backup.stop-during-backup=immich"

  immich-adminer:
    container_name: immich-adminer
    image: adminer
    restart: always
    depends_on:
      - database

  immich-backup:
    extends:
      file: ../.base/backup.yml
      service: backup
    container_name: immich-backup
    environment:
      - BACKUP_STOP_CONTAINER_LABEL=immich
    volumes:
      - /media/pool/STORAGE/docker/backups/immich:/archive
      - pgdata:/backup/immich-db:ro
    restart: always

volumes:
  pgdata:
    name: immich-db
  model-cache:
    name: immich-model-cache
  tsdata:
    name: immich-typesense

networks:
  default:
    external:
      name: web_services

Your .env content

DB_HOSTNAME=immich-postgres
DB_USERNAME=postgres
DB_PASSWORD=REDACTED
DB_DATABASE_NAME=immich
REDIS_HOSTNAME=immich-redis
UPLOAD_LOCATION=/media/pool/STORAGE/immich
TYPESENSE_API_KEY=REDACTED
PUBLIC_LOGIN_PAGE_MESSAGE=

IMMICH_WEB_URL=http://immich-web:3000
IMMICH_SERVER_URL=http://immich-server:3001
IMMICH_MACHINE_LEARNING_URL=http://immich-machine-learning:3003

LOG_LEVEL=simple

IMMICH_VERSION=v1.74.0

Reproduction steps

1. Apply the storage template from above
2. Use the CLI to upload a bunch of photos where a lot of photos have been shot in the same second of one another.

As this seems to be some kind of synchronization problem, it may not appear on the first try and may also depend on server power, disk speed, etc.

Additional information

I recently made a big upgrade to my server. On my old one the CPU was really bogged (had a passmark of 1500, Intel CPU from 2013), while I upgraded to a Ryzen 7 7700 (passmark 35266).

I noticed that in my old server the problem seemed to happen more frequently, so it maybe be some kind of issue that there is a race between two components, and if they go in the wrong order this situation happens.

Also to notice, the data is in a JBOD array of spinning disks, with mergerfs on top. So this could also happen because some disk operation is not fast enough in some cases.

alextran1502 commented 1 year ago

Thank you for reporting, this issue is being tracked in #2877