immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
51.99k stars 2.75k forks source link

[BUG] Exif Data not processed for large video files #4349

Closed maxbraun91 closed 5 months ago

maxbraun91 commented 1 year ago

The bug

Dear Team, first of all: thank you so much for your amazing work! I fell over the issue of immich not extracting metadata for a large video file (6.4 GB) within my external library. When I manually checked the exif data with exiftool, I got the following output:

ExifTool Version Number : 12.40 File Name : 2023-09-23_13-48-19.MOV Directory : . File Size : 6.4 GiB File Modification Date/Time : 2023:09:24 20:29:33+02:00 File Access Date/Time : 2023:10:04 10:14:35+02:00 File Inode Change Date/Time : 2023:09:24 20:29:33+02:00 File Permissions : -rwxr-xr-x File Type : MOV File Type Extension : mov MIME Type : video/quicktime Major Brand : Apple QuickTime (.MOV/QT) Minor Version : 0.0.0 Compatible Brands : qt Warning : End of processing at large atom (LargeFileSupport not enabled)

When adding the parameter -api largefilesupport=1 metadata was extracted as expected. Is it possible to enable large file support in immich as well?

Many thanks once again!

The OS that Immich Server is running on

Ubuntu 22.04.3

Version of Immich Server

v1.81.1

Version of Immich Mobile App

v1.80

Platform with the issue

Your docker-compose.yml content

version: "3.8"

services:

  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:release
    command: ["start-server.sh"]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - Library Folders
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:release
    extends:
      file: hwaccel.yml
      service: hwaccel
    command: ["start-microservices.sh"]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - Library Folders
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:release
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - ./immich_model-cache:/cache
    env_file:
      - .env
    restart: always

  immich-web:
    container_name: immich_web
    image: ghcr.io/immich-app/immich-web:release
    entrypoint: ["/bin/sh", "./entrypoint.sh"]
    env_file:
      - .env
    restart: always

  typesense:
    container_name: immich_typesense
    image: typesense/typesense:0.24.0
    environment:
      - TYPESENSE_API_KEY=${TYPESENSE_API_KEY}
      - TYPESENSE_DATA_DIR=/data
    logging:
      driver: none
    volumes:
      - ./immich_tsdata:/data
    restart: always

  redis:
    container_name: immich_redis
    image: redis:6.2
    restart: always

  database:
    container_name: immich_postgres
    image: postgres:14
    env_file:
      - .env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      PG_DATA: /var/lib/postgresql/data
    ports:
      - 5432:5432
    volumes:
      - ./immich_pgdata:/var/lib/postgresql/data
    restart: always

  immich-proxy:
    container_name: immich_proxy
    image: ghcr.io/immich-app/immich-proxy:release
    environment:
      # Make sure these values get passed through from the env file
      - IMMICH_SERVER_URL
      - IMMICH_WEB_URL
    ports:
      - 2283:8080
    logging:
      driver: none
    depends_on:
      - immich-server
    restart: always

Your .env content

###################################################################################
# Database
###################################################################################

DB_HOSTNAME=immich_postgres
DB_USERNAME=postgres
DB_PASSWORD=###
DB_DATABASE_NAME=immich

# Optional Database settings:
# DB_PORT=5432

###################################################################################
# Redis
###################################################################################

REDIS_HOSTNAME=immich_redis

# Optional Redis settings:

# Note: these parameters are not automatically passed to the Redis Container
# to do so, please edit the docker-compose.yml file as well. Redis is not configured
# via environment variables, only redis.conf or the command line

# REDIS_PORT=6379
# REDIS_DBINDEX=0
# REDIS_PASSWORD=
# REDIS_SOCKET=

###################################################################################
# Upload File Location
#
# This is the location where uploaded files are stored.
###################################################################################

UPLOAD_LOCATION=./uploadlocation

###################################################################################
# Typesense
###################################################################################
TYPESENSE_API_KEY=###
# TYPESENSE_ENABLED=false

###################################################################################
# Reverse Geocoding
#
# Reverse geocoding is done locally which has a small impact on memory usage
# This memory usage can be altered by changing the REVERSE_GEOCODING_PRECISION variable
# This ranges from 0-3 with 3 being the most precise
# 3 - Cities > 500 population: ~200MB RAM
# 2 - Cities > 1000 population: ~150MB RAM
# 1 - Cities > 5000 population: ~80MB RAM
# 0 - Cities > 15000 population: ~40MB RAM
####################################################################################

# DISABLE_REVERSE_GEOCODING=false
# REVERSE_GEOCODING_PRECISION=3

####################################################################################
# WEB - Optional
#
# Custom message on the login page, should be written in HTML form.
# For example:
# PUBLIC_LOGIN_PAGE_MESSAGE="This is a demo instance of Immich.<br><br>Email: <i>demo@demo.de</i><br>Password: <i>demo</i>"
####################################################################################

PUBLIC_LOGIN_PAGE_MESSAGE=###

####################################################################################
# Alternative Service Addresses - Optional
#
# This is an advanced feature for users who may be running their immich services on different hosts.
# It will not change which address or port that services bind to within their containers, but it will change where other services look for their peers.
# Note: immich-microservices is bound to 3002, but no references are made
####################################################################################

IMMICH_WEB_URL=http://immich-web:3000
IMMICH_SERVER_URL=http://immich-server:3001
IMMICH_MACHINE_LEARNING_URL=http://immich-machine-learning:3003

####################################################################################
# Alternative API's External Address - Optional
#
# This is an advanced feature used to control the public server endpoint returned to clients during Well-known discovery.
# You should only use this if you want mobile apps to access the immich API over a custom URL. Do not include trailing slash.
# NOTE: At this time, the web app will not be affected by this setting and will continue to use the relative path: /api
# Examples: http://localhost:3001, http://immich-api.example.com, etc
####################################################################################

#IMMICH_API_URL_EXTERNAL=http://localhost:3001

Reproduction steps

1. Set up immich instance
2. Load video file of 6.4GB size into external library
3. See no metadata

Additional information

No response

alextran1502 commented 1 year ago

Hello is this issue still there?

maxbraun91 commented 1 year ago

Hi Alex, yes, the issue is still present. I performed a full EXIF scan today and the large video, where I noticed the issue, still does not have any metadata.

pinionless commented 11 months ago

Issue still exists. Immich will not extract exif if the file is bigger than 2GB

This is the log message Immich gives at the end of exif extraction:

"Warning": "End of processing at large atom (LargeFileSupport not enabled)"

DeltaTango69 commented 9 months ago

I can confirm this behaviour. You have to add

exiftool -api LargeFileSupport=1 ........

when exiftool reads the metadata!

pinionless commented 9 months ago

Old bug. Devs ignore it.

stephen304 commented 6 months ago

Also experienced this with a simple 30 minute long 1080p 30fps video recorded on a pixel 5. It's about 6GB

Not only does the video date fall back to when I uploaded it and not have the GPS location, the video length appears as 0:00.

Also, might be related to the video showing as "not backed up" (cloud icon crossed out) on my mobile app, even though it successfully backed up and I checked that the file on the server has the same MD5, the app seems to think it wasn't backed up. If it's comparing date taken via metadata then this might be connected. Oddly the successfully uploaded video which shows up as May 5 doesn't show up as a duplicate in the app which has the video as May 4th.

Edit: Oddly logging out and back in to rebuild the timeline fixes the sync status issue, it now shows the video under the incorrect date but shows cloud+local so it seems to have realized the backed up copy is the same as the local copy

TransRapid commented 6 months ago

I can confirm this behaviour. You have to add

exiftool -api LargeFileSupport=1 ........

when exiftool reads the metadata!

add it where?

TransRapid commented 6 months ago

SOLUTION - SIMPLE WORKAROUND

https://github.com/photostructure/exiftool-vendored.js/issues/109#issuecomment-1043477394

This was the solution, very easy fix was to create your ExifTool config file with the options you need.

.ExifTool_config in the /app/node_modules/exiftool-vendored.pl directory.

.ExifTool_config in the /app/node_modules/exiftool-vendored.pl/bin directory

%Image::ExifTool::UserDefined::Options = (
    LargeFileSupport => 1,
);

You might also need to place it in your home Immich home directory, or wherever cd ~ takes you.

I am running Immich in a venv, on bare metal myself.

Also very important

restart your containers, or in my case sudo systemctl restart immich-machine-learning&&systemctl restart immich-microservices&&systemctl restart immich. It should work fine once you do that.

stephen304 commented 6 months ago

@TransRapid Did you just refresh the metadata of the specific asset after doing that to get it to work? I'm using docker so I mapped the .ExifTool_config file into /usr/src/app/node_modules/exiftool-vendored.pl, which exists on immich_server and immich_microservices, but not immich_machine_learning, so I skipped that one. I also mapped it into ~ (/root) and I can successfully cat the file to see the contents, it's owned by root and 660 permissions. I restarted the containers and refreshed the metadata of the large file but it still shows 0:00 for the duration unfortunately.

TransRapid commented 5 months ago

@stephen304

I can't remember if I refreshed the data or had just removed it prior then tried to reupload, but I think I had re-uploaded the file. I would try downloading and removing the file from Immich entirely. Try removing it from the trash too .

Also /usr/src/app/node_modules/exiftool-vendored.pl /usr/src/app/node_modules/exiftool-vendored.pl/bin should be the correct directory if it is that one that is making the difference, but I believe it is from the root, as per the official documentation mentioning that it should go in your home directory, and that is in fact where the home directory for immich is as far as docker goes. I didn't place it in any of the other directories you mentioned, just the one above and my immich user home folder.

The other consideration might be some SQL filesize limit.

Also, try turning on verbose logging within the Administration settings from the web interface. Then do something like tail -f /var/logs/immich/immich-microservices.log so you can see if anything else is tripping it up. Your log likely is elsewhere, but you want the immich-microservices.log.

How are you importing these files? Have you tried a new large file? What do your logs show when you try this?

stephen304 commented 5 months ago

@TransRapid So I figured it out! And yes I was also only putting the config in the home directory (/root) and the exiftool-vendored.pl directory, but I wasn't sure if I needed to do it for the immich_app container or immich_microservices, so I did both for a while.

At any rate, that didn't work (even when reuploading) and I think the reason why is that exiftool may not be looking in /root in the docker image even though $HOME is set to /root. I also tried /home/node which exists in the docker image and that didn't work either.

What did work though was the folder at /usr/src/app/node_modules/exiftool-vendored.pl/bin/, and I only mapped it into the immich_microservices container since from the log lines it seems that's the one that does the exif processing - though from looking at the docker-compose in the main branch it looks like immich-microservices is going away next update and being merged into the main app container. I confirmed in the microservices log it shows all the metadata and immich successfully shows the correct time (not file modify time), gps coordinates, and correct video length instead of 0:00.

So the workaround for docker users should just be:

    volumes:
      - ./.ExifTool_config:/usr/src/app/node_modules/exiftool-vendored.pl/bin/.ExifTool_config

added to immich_microservices (I just added it to the main container too so I don't forget to keep the workaround next update when the main container starts doing the microservices stuff).

And of course .ExifTool_config contains

%Image::ExifTool::UserDefined::Options = (
    LargeFileSupport => 1,
);

I haven't tested whether refreshing metadata on the bugged items fixes it since I switched to deleting, emptying trash, and reuploading like you did just to make sure I was properly testing the fix.

Hope this helps anybody else with this issue until a built-in fix is implemented.

TransRapid commented 5 months ago

Ah okay so the winning location to place the config file in any case seems to be in the same directory as your exiftool executable.

I did check and that is the same area I have mine in. I don't know why I didn't catch that here when I posted. I will update mine as well for anyone else finding this.

mertalev commented 5 months ago

Fixed via #9894

HpNoTiQ56 commented 5 months ago

Hi, I've updated to last version 1.106.1, GPS data is now shown but duration is still 0:00 from server tabs and is good when clicking on video assets! (Video : 3,3gb). No exif data at all from another video. (GPS and time).

jrasm91 commented 5 months ago

You need to re-run metadata extraction for the affect assets.

stephen304 commented 5 months ago

So it actually looks like this bug wasn't fixed by #9894, at least with my 6GB file. I removed the .ExifTool_config file and upgraded to 1.106.1, deleted the large video and emptied trash, then re-uploaded it and it presented the same problem as before. 0:00 length, dated as today when I uploaded it, and no GPS location. restoring the .ExifTool_config file and using the "Refresh metadata" option fixed it. (It's nice to know that just using the refresh button is enough to test this since I had been re-uploading the file every time before)

Looking at the debug log when refreshing the metadata confirms this:

[Nest] 17  - 06/11/2024, 12:13:57 PM   DEBUG [Api:LoggingInterceptor~c99ba85q] POST /api/assets/jobs 204 9.48ms 216.252.201.10
[Nest] 17  - 06/11/2024, 12:13:58 PM VERBOSE [Api:LoggingInterceptor~c99ba85q] {"assetIds":["156fa103-4441-4c31-9a93-d8f7825be51f"],"name":"refresh-metadata"}
[Nest] 7  - 06/11/2024, 12:13:58 PM VERBOSE [Microservices:MetadataService] Exif Tags
[Nest] 7  - 06/11/2024, 12:13:58 PM VERBOSE [Microservices:MetadataService] Object:
{
  "SourceFile": "/usr/src/app/upload/library/admin/2024/2024-05-04/PXL_20240504_233345242.mp4",
  "errors": [],
  "tz": "UTC",
  "tzSource": "defaultVideosToUTC",
  "ExifToolVersion": 12.85,
  "FileName": "PXL_20240504_233345242.mp4",
  "Directory": "/usr/src/app/upload/library/admin/2024/2024-05-04",
  "FileSize": "6.7 GB",
  "FileModifyDate": {
    "_ctor": "ExifDateTime",
    "year": 2024,
    "month": 6,
    "hour": 11,
    "minute": 52,
    "second": 26,
    "tzoffsetMinutes": -240,
    "rawValue": "2024:06:11 11:52:26-04:00",
    "zoneName": "UTC-4",
    "inferredZone": false
  },
  "FileAccessDate": {
    "_ctor": "ExifDateTime",
    "year": 2024,
    "month": 6,
    "day": 11,
    "hour": 12,
    "minute": 8,
    "second": 54,
    "tzoffsetMinutes": -240,
    "rawValue": "2024:06:11 12:08:54-04:00",
    "zoneName": "UTC-4",
    "inferredZone": false
  },
  "FileInodeChangeDate": {
    "_ctor": "ExifDateTime",
    "year": 2024,
    "month": 6,
    "day": 11,
    "hour": 12,
    "minute": 8,
    "second": 54,
    "tzoffsetMinutes": -240,
    "rawValue": "2024:06:11 12:08:54-04:00",
    "zoneName": "UTC-4",
    "inferredZone": false
  },
  "FileType": "MP4",
  "FileTypeExtension": "mp4",
  "MIMEType": "video/mp4",
  "MajorBrand": "MP4 Base Media v1 [IS0 14496-12:2003]",
  "MinorVersion": "2.0.0",
    "isom",
    "iso2",
  ],
  "Warning": "End of processing at large atom (LargeFileSupport not enabled)",
  "warnings": [
    "End of processing at large atom (LargeFileSupport not enabled)"
  ]
}
[Nest] 7  - 06/11/2024, 12:13:59 PM   DEBUG [Microservices:MediaService] Attempting to rename file: upload/library/admin/2024/2024-05-04/PXL_20240504_233345242.mp4 => upload/library/admin/2024/2024-05-05/PXL_20240504_233345242.mp4
HpNoTiQ56 commented 5 months ago

Fixed by using workaround : volumes:

stephen304 commented 5 months ago

So I did a little more debugging, using strace -p <pid> -e read,write -s 1000000 with the pid of the long-running exiftool process let me see what configuration options it's reading from stdin (which it does due to using -stay_open True -@ -):

read(0, "
-json
-struct
-use
MWG
-*Duration*#
-GPSAltitude#
-GPSLatitude#
-GPSLongitude#
-GPSPosition#
-Orientation#
-FocalLength#
-all
/usr/src/app/upload/library/admin/2024/2024-05-05/PXL_20240504_233345242.mp4
-ignoreMinorErrors
-execute
", 65536) = 231

And so I was able to replicate the issue by running locally: /usr/bin/perl -w /usr/bin/vendor_perl/exiftool -api largefilesupport=1 -stay_open True -@ -

Then I can execute a sequence of commands like:

/home/stephen/Downloads/PXL_20240504_233345242.mp4
-execute

What seems to be happening is that after -execute, the next commands lose the -api largefilesupport=1, so in order to make it work, you need to add the flag for every command like so:

-api
largefilesupport=1
-json
-struct
-use
MWG
-*Duration*#
-GPSAltitude#
-GPSLatitude#
-GPSLongitude#
-GPSPosition#
-Orientation#
-FocalLength#
-all
/home/stephen/Downloads/PXL_20240504_233345242.mp4
-ignoreMinorErrors
-execute

So the original solution of passing the args when constructing exiftool doesn't seem viable, and exiftool-vendored doesn't seem to have any ability to load in an argument to be used each time we use it, only args to spawn the initial process, so the solution probably is to pass the largefilesupport flag in each place we call exiftool functions.

So digging through immich and exiftool-vendored code, we use exiftool-vendored in 3 places for read, extractBinaryTag, and writeTags. I'm not sure if it's possible to get an error running extractBinaryTag or writeTags on a file >2GB, so I just investigated adding the largefilesupport tag to the read command.

The read command in exiftool-vendored accepts additional args both in optionalArgs or options.optionalArgs. As it happens, it seems that arguments passed into read using the optionalArgs param (eg. .read(path, ['-api', 'largefilesupport=1'], {) gets overwritten by the expansion of options in ExifTool.ts's read func - at least I assume that's what's happening since reordering so that optionalArgs is last allows me to pass in stuff:

        optionalArgs,
        ...pick(this.options, ...ReadTaskOptionFields),
        ...options,

Thankfully, the optionalArgs field of options seems to work fine, so we can pass in the flag there:

diff --git a/server/src/repositories/metadata.repository.ts b/server/src/repositories/metadata.repository.ts
index 5baf07829..eca7a0bef 100644
--- a/server/src/repositories/metadata.repository.ts
+++ b/server/src/repositories/metadata.repository.ts
@@ -21,26 +21,27 @@ export class MetadataRepository implements IMetadataRepository {
   ) {
     this.logger.setContext(MetadataRepository.name);
   }

   async teardown() {
     await exiftool.end();
   }

   readTags(path: string): Promise<ImmichTags | null> {
     return exiftool
       .read(path, undefined, {
         ...DefaultReadTaskOptions,

+        optionalArgs: ['-api', 'largefilesupport=1'],
         defaultVideosToUTC: true,
         backfillTimezones: true,
         inferTimezoneFromDatestamps: true,
         useMWG: true,
         numericTags: [...DefaultReadTaskOptions.numericTags, 'FocalLength'],
         /* eslint unicorn/no-array-callback-reference: off, unicorn/no-array-method-this-argument: off */
         geoTz: (lat, lon) => geotz.find(lat, lon)[0],
       })
       .catch((error) => {
         this.logger.warn(`Error reading exif data (${path}): ${error}`, error?.stack);
         return null;
       }) as Promise<ImmichTags | null>;
   }

I think this means the previous change can be reverted since we don't need to maintain an instance of exiftool and can just call the methods statically.

Also it looks like exiftool.write() should be able to accept extra flags so it might be helpful to add the lfs flag there too. It doesn't look like it's possible to pass any extra flags to extractBinaryTagToBuffer since the options interface only picks specific fields.

I created a PR with this fix here: https://github.com/immich-app/immich/pull/10167

stephen304 commented 5 months ago

Fix is now merged and I can confirm that it works using the latest container built from the main branch and no more workaround, just refresh metadata and then reload the page :tada:

TransRapid commented 5 months ago

I actually left my symbolic link that I placed there originally. It is a good option to have additional controls if need be. It references the same one I use outside of Immich.