immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
44.56k stars 2.17k forks source link

[BUG] immich upload CLI silently ignore some fails (EXIF prob?) and wants to re-upload them #3567

Closed doegox closed 9 months ago

doegox commented 1 year ago

The bug

Hi, sorry the title may be not clear, not easy to describe in one sentence.

The OS that Immich Server is running on

Docker on Debian

Version of Immich Server

v1.71.0

Version of Immich Mobile App

v1.71.0

Platform with the issue

Your docker-compose.yml content

version: "3.8"

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "immich" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "microservices" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always

  immich-web:
    container_name: immich_web
    image: ghcr.io/immich-app/immich-web:${IMMICH_VERSION:-release}
    env_file:
      - .env
    restart: always

  typesense:
    container_name: immich_typesense
    image: typesense/typesense:0.24.1@sha256:9bcff2b829f12074426ca044b56160ca9d777a0c488303469143dd9f8259d4dd
    environment:
      - TYPESENSE_API_KEY=${TYPESENSE_API_KEY}
      - TYPESENSE_DATA_DIR=/data
    volumes:
      - tsdata:/data
    restart: always

  redis:
    container_name: immich_redis
    image: redis:6.2-alpine@sha256:70a7a5b641117670beae0d80658430853896b5ef269ccf00d1827427e3263fa3
    restart: always

  database:
    container_name: immich_postgres
    image: postgres:14-alpine@sha256:28407a9961e76f2d285dc6991e8e48893503cc3836a4755bbc2d40bcc272a441
    env_file:
      - .env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      PG_DATA: /var/lib/postgresql/data
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: always

  immich-proxy:
    container_name: immich_proxy
    image: ghcr.io/immich-app/immich-proxy:${IMMICH_VERSION:-release}
    environment:
      # Make sure these values get passed through from the env file
      - IMMICH_SERVER_URL
      - IMMICH_WEB_URL
    ports:
      - 2283:8080
    depends_on:
      - immich-server
      - immich-web
    restart: always

  backup:
    container_name: immich_db_dumper
    image: prodrigestivill/postgres-backup-local
    env_file:
      - .env
    environment:
      POSTGRES_HOST: database
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      SCHEDULE: "@daily"
      BACKUP_KEEP_DAYS: 7
      BACKUP_KEEP_WEEKS: 4
      BACKUP_KEEP_MONTHS: 6
      BACKUP_DIR: /db_dumps
    volumes:
      - ./db_dumps:/db_dumps
    depends_on:
      - database

volumes:
  pgdata:
  model-cache:
  tsdata:

Your .env content

###################################################################################
# Database
###################################################################################

# NOTE: The following four database variables support Docker secrets by adding a *_FILE suffix to the variable name
# See the docker-compose documentation on secrets for additional details: https://docs.docker.com/compose/compose-file/compose-file-v3/#secrets
DB_HOSTNAME=immich_postgres
DB_USERNAME=postgres
DB_PASSWORD=:p
DB_DATABASE_NAME=immich

# Optional Database settings:
# DB_PORT=5432

###################################################################################
# Redis
###################################################################################

REDIS_HOSTNAME=immich_redis

# REDIS_URL will be used to pass custom options to ioredis.
# Example for Sentinel
# {"sentinels":[{"host":"redis-sentinel-node-0","port":26379},{"host":"redis-sentinel-node-1","port":26379},{"host":"redis-sentinel-node-2","port":26379}],"name":"redis-sentinel"}
# REDIS_URL=ioredis://eyJzZW50aW5lbHMiOlt7Imhvc3QiOiJyZWRpcy1zZW50aW5lbDEiLCJwb3J0IjoyNjM3OX0seyJob3N0IjoicmVkaXMtc2VudGluZWwyIiwicG9ydCI6MjYzNzl9XSwibmFtZSI6Im15bWFzdGVyIn0=

# Optional Redis settings:

# Note: these parameters are not automatically passed to the Redis Container
# to do so, please edit the docker-compose.yml file as well. Redis is not configured
# via environment variables, only redis.conf or the command line

# REDIS_PORT=6379
# REDIS_DBINDEX=0
# REDIS_USERNAME=
# REDIS_PASSWORD=
# REDIS_SOCKET=

###################################################################################
# Upload File Location
#
# This is the location where uploaded files are stored.
###################################################################################

UPLOAD_LOCATION=/home/immich

###################################################################################
# Typesense
###################################################################################
TYPESENSE_API_KEY=:p
# TYPESENSE_ENABLED=false
# TYPESENSE_URL uses base64 encoding for the nodes json.
# Example JSON that was used:
# [
#      { "host": "typesense-1.example.net", "port": "443", "protocol": "https" },
#      { "host": "typesense-2.example.net", "port": "443", "protocol": "https" },
#      { "host": "typesense-3.example.net", "port": "443", "protocol": "https" },
# ]
# TYPESENSE_URL=ha://WwogIHsgImhvc3QiOiAidHlwZXNlbnNlLTEuZXhhbXBsZS5uZXQiLCAicG9ydCI6ICI0NDMiLCAicHJvdG9jb2wiOiAiaHR0cHMiIH0sCiAgeyAiaG9zdCI6ICJ0eXBlc2Vuc2UtMi5leGFtcGxlLm5ldCIsICJwb3J0IjogIjQ0MyIsICJwcm90b2NvbCI6ICJodHRwcyIgfSwKICB7ICJob3N0IjogInR5cGVzZW5zZS0zLmV4YW1wbGUubmV0IiwgInBvcnQiOiAiNDQzIiwgInByb3RvY29sIjogImh0dHBzIiB9Cl0=

###################################################################################
# Reverse Geocoding
#
# Reverse geocoding is done locally which has a small impact on memory usage
# This memory usage can be altered by changing the REVERSE_GEOCODING_PRECISION variable
# This ranges from 0-3 with 3 being the most precise
# 3 - Cities > 500 population: ~200MB RAM
# 2 - Cities > 1000 population: ~150MB RAM
# 1 - Cities > 5000 population: ~80MB RAM
# 0 - Cities > 15000 population: ~40MB RAM
####################################################################################

# DISABLE_REVERSE_GEOCODING=false
# REVERSE_GEOCODING_PRECISION=3

####################################################################################
# WEB - Optional
#
# Custom message on the login page, should be written in HTML form.
# For example:
# PUBLIC_LOGIN_PAGE_MESSAGE="This is a demo instance of Immich.<br><br>Email: <i>demo@demo.de</i><br>Password: <i>demo</i>"
####################################################################################

PUBLIC_LOGIN_PAGE_MESSAGE=

####################################################################################
# Alternative Service Addresses - Optional
#
# This is an advanced feature for users who may be running their immich services on different hosts.
# It will not change which address or port that services bind to within their containers, but it will change where other services look for their peers.
# Note: immich-microservices is bound to 3002, but no references are made
####################################################################################

IMMICH_WEB_URL=http://immich-web:3000
IMMICH_SERVER_URL=http://immich-server:3001
IMMICH_MACHINE_LEARNING_URL=http://immich-machine-learning:3003

####################################################################################
# Alternative API's External Address - Optional
#
# This is an advanced feature used to control the public server endpoint returned to clients during Well-known discovery.
# You should only use this if you want mobile apps to access the immich API over a custom URL. Do not include trailing slash.
# NOTE: At this time, the web app will not be affected by this setting and will continue to use the relative path: /api
# Examples: http://localhost:3001, http://immich-api.example.com, etc
####################################################################################

#IMMICH_API_URL_EXTERNAL=http://localhost:3001

###################################################################################
# Immich Version - Optional
#
# This allows all immich docker images to be pinned to a specific version. By default,
# the version is "release" but could be a specific version, like "v1.59.0".
###################################################################################

#IMMICH_VERSION=

Reproduction steps

Use immich-cli upload on a bunch of pics. It uploads them without apparent error. Re-run the same command on the same pics. A few of them want to be re-uploaded, they apparently didn't make their path to the server. Re-run the same command on the same pics again. Same subset again want to be re-uploaded. I'm not sure what the reason is, but when I sanitize the EXIF data with

exiftool -all= -tagsfromfile @ -all:all -unsafe -icc_profile -P -Overwrite_Original 

Then I can upload those pics and if I repeat the command, they are well detected as already present on the server. So I suspect the pics fail on the server because of some EXIF parsing but the error is not reported to the immich-cli.

I selected 2 such pics, before and after EXIF fix, so you can test on your side. I'll try to upload them on the issue else I'll find another way.

Additional information

$ docker run -it --rm -v "$(pwd):/import" ghcr.io/immich-app/immich-cli:latest --version
0.40.2
doegox commented 1 year ago

Failing pics: aab IMG_20220605_192018

doegox commented 1 year ago

Same pics but sanitized EXIF, those work fine. aab IMG_20220605_192018

doegox commented 1 year ago

Side note: because there is no way to get immich-cli printing the pics it's uploading (they only appear briefly in the progress bar), locating such failing pics among a larger set is a bit of a story of finding the needle in the haystack...

doegox commented 1 year ago

Maybe it is not an EXIF problem, I see https://github.com/immich-app/immich/issues/3615 got created since then, mentioning duplicate keys in postgres, and if I check my postgres logs I also see

2023-08-13 12:39:36.737 UTC [110278] ERROR:  duplicate key value violates unique constraint "UQ_userid_checksum"
2023-08-13 12:39:36.737 UTC [110278] DETAIL:  Key ("ownerId", checksum)=(35755538-1616-4aef-841a-8a0b873fd08b, \xe934a486649279181889266e08eb8aa04dfde96e) already exists.
2023-08-13 12:39:36.737 UTC [110278] STATEMENT:  INSERT INTO "assets"("id", "deviceAssetId", "ownerId", "deviceId", "type", "originalPath", "resizePath", "webpPath", "thumbhash", "encodedVideoPath", "createdAt", "updatedAt", "fileCreatedAt", "fileModifiedAt", "isFavorite", "isArchived", "isReadOnly", "checksum", "duration", "isVisible", "livePhotoVideoId", "originalFileName", "sidecarPath") VALUES (DEFAULT, $1, $2, $3, $4, $5, $6, $7, $8, $9, DEFAULT, DEFAULT, $10, $11, $12, $13, $14, $15, $16, $17, DEFAULT, $18, $19) RETURNING "id", "webpPath", "encodedVideoPath", "createdAt", "updatedAt", "isFavorite", "isArchived", "isReadOnly", "isVisible"

and maybe the "EXIF fix" had just the side effect of modifying the file checksum...

kristof-mattei commented 1 year ago

If we check the http://immich/api/asset/CLI endpoint there are missing files.

I definitely see that it does NOT return the .MOV associated with .HEIC / .JPG for iOS 'live' photos. This causes my system to try to upload files over and over.

Second issue, if you check the endpoint, it's actually very risky...

It contains filename + filesize, so if you have 2 files, different name, same directory you'll get a crash as the checksum is filename-independent, yet the id is based on filename-filesize:

https://github.com/immich-app/CLI/blob/ee750b8d48fce68bd7ada07c540e9d8957e7124e/bin/index.ts#L166

Equally, if you have 2 files with the same name and same size (which honestly isn't that weird of a collision, I have 5 distinct IMG_1999.JPG on my machine) only one will be uploaded.

kristof-mattei commented 1 year ago

Update: there is a column that defines whether an item is visible or not (I'm assuming to hide live photos). The asset/:deviceId does not return invisible items causing the CLI to try to upload 'invisible' items over and over.

kristof-mattei commented 1 year ago

Proof of collision, 2 jpgs with different checksum but identical sizes are considered identical.

kristof in ~/image-collision
❮ fd --type file
first/image.jpg
second/image.jpg

kristof in ~/image-collision
❯ fd --type file | xargs sha1sum
ce4de1fa20ab3c6d34034fcd7db29e909d26fc52  first/image.jpg
700b761b3f7584cfc0983a31230aa7d657ff918b  second/image.jpg

kristof in ~/image-collision
❮ fd --type file --exec node --eval "const name = process.argv[2]; console.log(path.basename(name) + \"-\" + fs.statSync(name).size)" -
image.jpg-748
image.jpg-748

Second file never gets uploaded because it's already in asset/:deviceId

hugoghx commented 12 months ago

+1 Also having this issue. Specifically, it seems to happen with HEIC files (?). Unsure.

alextran1502 commented 10 months ago

Hello is this issue still relevance?

kristof-mattei commented 10 months ago

Let me test!

kristof-mattei commented 10 months ago

Repeated same steps, 2 images, both identical filename-fizesize.jpg. Both get uploaded successfully.

CLI seems to have stopped working though, can't test through there.

jrasm91 commented 9 months ago

We just released a rewrite of the CLI. I'm not sure if it has the same problem or not.

doegox commented 9 months ago

I don't have issues anymore with the new client, thank you !