immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
45.08k stars 2.18k forks source link

[BUG] CLI Upload Repeatedly Tries to Upload - duplicate key violations in postgres logs #3615

Closed tidalvirus closed 1 year ago

tidalvirus commented 1 year ago

The bug

When running the CLI (via a docker / bash alias - as documented here, I expect the upload to bypass future uploads of the same directory. However, in some directories I try to upload, I get repeated attempts to upload the same number of files, which fail, due to a duplicate key check in postgres.

Is this expected behaviour? I'm a new user to immich - my rough timeline of usage: ~72 hours ago - initial setup and uploading from my iOS phone - only photos taken in the last 3 months ~48 hours ago - adding another album on my phone (an iCloud shared photo album) - this is still not fully uploaded. Includes photos from years ago (I think this might be relevant, and the cause of the duplicates) ~24 hours ago - doing a first upload from the CLI, for older photos that mostly weren't on my phone, which I left running overnight - didn't run with --recursive accidentally, but had a lot (~6000) of photos in base directory. Didn't pay attention to whether there were duplicates. ~2-3 hours ago - trying to do more CLI based uploads, and noticed this weirdness with repeated attempts to upload via CLI.

I saw issue #975 - but it doesn't seem to be exactly this situation.

The OS that Immich Server is running on

Docker on Debian 12 (amd64)

Version of Immich Server

v1.72.2

Version of Immich Mobile App

v1.72.0

Platform with the issue

Your docker-compose.yml content

version: "3.8"

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "immich" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    # extends:
    #   file: hwaccel.yml
    #   service: hwaccel
    command: [ "start.sh", "microservices" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always

  immich-web:
    container_name: immich_web
    image: ghcr.io/immich-app/immich-web:${IMMICH_VERSION:-release}
    env_file:
      - .env
    restart: always

  typesense:
    container_name: immich_typesense
    image: typesense/typesense:0.24.1@sha256:9bcff2b829f12074426ca044b56160ca9d777a0c488303469143dd9f8259d4dd
    environment:
      - TYPESENSE_API_KEY=${TYPESENSE_API_KEY}
      - TYPESENSE_DATA_DIR=/data
    volumes:
      - tsdata:/data
    restart: always

  redis:
    container_name: immich_redis
    image: redis:6.2-alpine@sha256:70a7a5b641117670beae0d80658430853896b5ef269ccf00d1827427e3263fa3
    restart: always

  database:
    container_name: immich_postgres
    image: postgres:14-alpine@sha256:28407a9961e76f2d285dc6991e8e48893503cc3836a4755bbc2d40bcc272a441
    env_file:
      - .env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: always

  immich-proxy:
    container_name: immich_proxy
    image: ghcr.io/immich-app/immich-proxy:${IMMICH_VERSION:-release}
    environment:
      # Make sure these values get passed through from the env file
      - IMMICH_SERVER_URL
      - IMMICH_WEB_URL
    ports:
      - 2283:8080
    depends_on:
      - immich-server
      - immich-web
    restart: always

volumes:
  pgdata:
  model-cache:
  tsdata:

Your .env content

DB_HOSTNAME=immich_postgres
DB_USERNAME=postgres
DB_PASSWORD=<key>
DB_DATABASE_NAME=immich
REDIS_HOSTNAME=immich_redis
UPLOAD_LOCATION=/mnt/immich
TYPESENSE_API_KEY=<key>
PUBLIC_LOGIN_PAGE_MESSAGE="Mindless Immich"
IMMICH_WEB_URL=http://immich-web:3000
IMMICH_SERVER_URL=http://immich-server:3001
IMMICH_MACHINE_LEARNING_URL=http://immich-machine-learning:3003
IMMICH_API_URL_EXTERNAL=https://immich.mindless.co.uk

Reproduction steps

1. I'm unsure how to replicate this - as it doesn't happen with every folder. I think the only way I can truly replicate this is to start from a clean slate and try again - which I could do if it's useful!

Additional information

immich_postgres            | 2023-08-09 04:05:18.732 UTC [9039] ERROR:  duplicate key value violates unique constraint "UQ_userid_checksum"
immich_postgres            | 2023-08-09 04:05:18.732 UTC [9039] DETAIL:  Key ("ownerId", checksum)=(b1e4f17d-1885-41f5-904f-3b5b14485c98, \x2732f36a09ebaa44e2740681de030604f1b3dbaf) already exists.
immich_postgres            | 2023-08-09 04:05:18.732 UTC [9039] STATEMENT:  INSERT INTO "assets"("id", "deviceAssetId", "ownerId", "deviceId", "type", "originalPath", "resizePath", "webpPath", "thumbhash", "encodedVideoPath", "createdAt", "updatedAt", "fileCreatedAt", "fileModifiedAt", "isFavorite", "isArchived", "isReadOnly", "checksum", "duration", "isVisible", "livePhotoVideoId", "originalFileName", "sidecarPath") VALUES (DEFAULT, $1, $2, $3, $4, $5, $6, $7, $8, $9, DEFAULT, DEFAULT, $10, $11, $12, $13, $14, $15, $16, $17, DEFAULT, $18, $19) RETURNING "id", "webpPath", "encodedVideoPath", "createdAt", "updatedAt", "isFavorite", "isArchived", "isReadOnly", "isVisible"
immich_postgres            | 2023-08-09 04:05:18.765 UTC [9039] ERROR:  duplicate key value violates unique constraint "UQ_userid_checksum"
immich_postgres            | 2023-08-09 04:05:18.765 UTC [9039] DETAIL:  Key ("ownerId", checksum)=(b1e4f17d-1885-41f5-904f-3b5b14485c98, \x2732f36a09ebaa44e2740681de030604f1b3dbaf) already exists.
immich_postgres            | 2023-08-09 04:05:18.765 UTC [9039] STATEMENT:  INSERT INTO "assets"("id", "deviceAssetId", "ownerId", "deviceId", "type", "originalPath", "resizePath", "webpPath", "thumbhash", "encodedVideoPath", "createdAt", "updatedAt", "fileCreatedAt", "fileModifiedAt", "isFavorite", "isArchived", "isReadOnly", "checksum", "duration", "isVisible", "livePhotoVideoId", "originalFileName", "sidecarPath") VALUES (DEFAULT, $1, $2, $3, $4, $5, $6, $7, $8, $9, DEFAULT, DEFAULT, $10, $11, $12, $13, $14, $15, $16, $17, DEFAULT, $18, $19) RETURNING "id", "webpPath", "encodedVideoPath", "createdAt", "updatedAt", "isFavorite", "isArchived", "isReadOnly", "isVisible"
typesense_1                | I20230809 04:05:27.819044   646 raft_server.cpp:546] Term: 2, last_index index: 15135, committed_index: 15135, known_applied_index: 15135, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 138993
typesense_1                | I20230809 04:05:27.819183   864 raft_server.h:60] Peer refresh succeeded!
immich_postgres            | 2023-08-09 04:05:31.292 UTC [9040] ERROR:  duplicate key value violates unique constraint "UQ_userid_checksum"
immich_postgres            | 2023-08-09 04:05:31.292 UTC [9040] DETAIL:  Key ("ownerId", checksum)=(b1e4f17d-1885-41f5-904f-3b5b14485c98, \x2732f36a09ebaa44e2740681de030604f1b3dbaf) already exists.
immich_postgres            | 2023-08-09 04:05:31.292 UTC [9040] STATEMENT:  INSERT INTO "assets"("id", "deviceAssetId", "ownerId", "deviceId", "type", "originalPath", "resizePath", "webpPath", "thumbhash", "encodedVideoPath", "createdAt", "updatedAt", "fileCreatedAt", "fileModifiedAt", "isFavorite", "isArchived", "isReadOnly", "checksum", "duration", "isVisible", "livePhotoVideoId", "originalFileName", "sidecarPath") VALUES (DEFAULT, $1, $2, $3, $4, $5, $6, $7, $8, $9, DEFAULT, DEFAULT, $10, $11, $12, $13, $14, $15, $16, $17, DEFAULT, $18, $19) RETURNING "id", "webpPath", "encodedVideoPath", "createdAt", "updatedAt", "isFavorite", "isArchived", "isReadOnly", "isVisible"
immich_postgres            | 2023-08-09 04:05:31.306 UTC [9040] ERROR:  duplicate key value violates unique constraint "UQ_userid_checksum"
immich_postgres            | 2023-08-09 04:05:31.306 UTC [9040] DETAIL:  Key ("ownerId", checksum)=(b1e4f17d-1885-41f5-904f-3b5b14485c98, \x2732f36a09ebaa44e2740681de030604f1b3dbaf) already exists.
immich_postgres            | 2023-08-09 04:05:31.306 UTC [9040] STATEMENT:  INSERT INTO "assets"("id", "deviceAssetId", "ownerId", "deviceId", "type", "originalPath", "resizePath", "webpPath", "thumbhash", "encodedVideoPath", "createdAt", "updatedAt", "fileCreatedAt", "fileModifiedAt", "isFavorite", "isArchived", "isReadOnly", "checksum", "duration", "isVisible", "livePhotoVideoId", "originalFileName", "sidecarPath") VALUES (DEFAULT, $1, $2, $3, $4, $5, $6, $7, $8, $9, DEFAULT, DEFAULT, $10, $11, $12, $13, $14, $15, $16, $17, DEFAULT, $18, $19) RETURNING "id", "webpPath", "encodedVideoPath", "createdAt", "updatedAt", "isFavorite", "isArchived", "isReadOnly", "isVisible"

# Lame way to count files :P
sr@brutish:/mnt/containers/nextcloud/data/sr/files/Photos$ ls -lR 2019 | wc -l
73
sr@brutish:/mnt/containers/nextcloud/data/sr/files/Photos$ ls -lR 2018 | wc -l
52

# 2018 seems to want to upload 2 files repeatedly
sr@brutish:/mnt/containers/nextcloud/data/sr/files/Photos$ immich upload --key <key> --server http://10.1.1.8:2283/api --recursive 2018/
Checking connectivity with Immich instance...
Server status: OK
Checking credentials...
Login status: OK
Successful authentication for user <email>
Indexing local assets...
Indexing complete, found 2 local assets
Comparing local assets with those on the Immich instance...
A total of 2 assets will be uploaded to the server
Do you want to start upload now? (y/n) y
Start uploading...
Upload Progress | ████████████████████████████████████████ | 100% || 2/2 || Current file [/import/2018/10/20181005-174708-0616.jpg]
sr@brutish:/mnt/containers/nextcloud/data/sr/files/Photos$ immich upload --key <key> --server http://10.1.1.8:2283/api --recursive 2018/
Checking connectivity with Immich instance...
Server status: OK
Checking credentials...
Login status: OK
Successful authentication for user <email>
Indexing local assets...
Indexing complete, found 2 local assets
Comparing local assets with those on the Immich instance...
A total of 2 assets will be uploaded to the server
Do you want to start upload now? (y/n) y
Start uploading...
Upload Progress | ████████████████████████████████████████ | 100% || 2/2 || Current file [/import/2018/10/20181005-174708-0616.jpg]
sr@brutish:/mnt/containers/nextcloud/data/sr/files/Photos$

# 2019 works fine - unfortunately, not sure when these files were uploaded, but it would've been in the last 24-48 hours
sr@brutish:/mnt/containers/nextcloud/data/sr/files/Photos$ immich upload --key <key> --server http://10.1.1.8:2283/api --recursive 2019/
Checking connectivity with Immich instance...
Server status: OK
Checking credentials...
Login status: OK
Successful authentication for user <email>
Indexing local assets...
Indexing complete, found 23 local assets
Comparing local assets with those on the Immich instance...
All assets have been backed up to the server
jrnewell commented 1 year ago

I see the same behavior: duplicate uploads of movies on every attempt of doing a sync. This also seems to makes the "remaining" assets count confused. This is from iOS iPhone.

I like this app, but for a backup solution, more work needs to be put into reliability and resiliency (uploads pause frequently) of the backup process. It is if difficult to have confidence that everything is being properly synced as a user. Maybe look into using something like syncthing or rsync under the hood?

jrasm91 commented 1 year ago

The cli will skip assets that it initially uploaded, but that's it. Duplicate assets will cause unique key violations in the logs and that is fine and expected. This is probably happening because duplicates were uploaded from another source.

There is some work in progress regarding the CLI and mobile apps to have a better algorithm for determining what to upload and what to skip, but it is working as currently designed, even if it has some limitations.

tidalvirus commented 1 year ago

@jrasm91 - appreciate that this is expected behaviour - I can cope with that :) However, can I ask where the better algorithm stuff is being tracked, so I (and other interested parties) can follow it?

Thanks!!

jrasm91 commented 1 year ago

https://github.com/immich-app/immich/issues/2567