immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
38.87k stars 1.84k forks source link

[BUG] Wrong Date but Exif good #5164

Open stevenwalton opened 7 months ago

stevenwalton commented 7 months ago

The bug

I was testing a way to backup my Google Photos and though, hey, I'll try to move some photos to the device and see if Immich automatically syncs them. Not a great solution because you can't batch move images to device but that's not important.

So I test with a few images, taking my oldest images. As expected Immich backs up the photo. Success? These photos are actually at the newest portion of my timeline. So I go to today's Immich folder and run exiftool myimg.jpg. Looking at the output things look right

$ /path/immich_library/library/admin/2023/2023-11-19 $ exiftool 2012041619xxxx+1.jpg | grep -i date
File Modification Date/Time     : 2023:11:19 13:xx:xx-08:00
File Access Date/Time           : 2023:11:19 14:xx:xx-08:00
File Inode Change Date/Time     : 2023:11:19 14:xx:xx-08:00
Modify Date                     : 2012:04:16 19:xx:xx
...

In this case we can probably tell which is the correct date given the name of the file but probably not something to rely on. x's artificial because you don't need minute and second the photo was taken. The other exif data is also good but irrelevant to the topic at hand (it correctly captures the phone I used, software that processed the image, etc).

I believe that Immich is grabbing the most up to date time rather than the actual date the photo was taken (in 2012). In the web interface we also see that it says the photo was taken at Nov 19, 2023Sun, 9:36 PM GMT which does not seem to match any time I'm seeing.

I don't know much about the internals to this software so I don't know if just changing the key it looks for resolves this issue or breaks others but just that this is unexpected behavior. I apologize if someone else has brought up the issue but I couldn't find it.

The OS that Immich Server is running on

Debian 12 (bookworm) on a raspberry pi 4

Version of Immich Server

v1.86.0

Version of Immich Mobile App

v1.87.9 build.111

Platform with the issue

Your docker-compose.yml content

Standard with the 3 lines for hardware acceleration commented out

Your .env content

Standard

Edit

I didn't check hard enough. Some images did get properly moved to the correct location on the timeline! Here's the same output for images that DID move correctly

endurance@endurance:/epool/immich_library/library/admin/2014/2014-01-14 $ exiftool 20140114_09xxxx+1.jpg | grep -i date
File Modification Date/Time     : 2023:11:19 13:xx:xx-08:00
File Access Date/Time           : 2023:11:19 15:xx:xx-08:00
File Inode Change Date/Time     : 2023:11:19 13:xx:xx-08:00
Modify Date                     : 2014:01:14 09:xx:xx
Date/Time Original              : 2014:01:14 09:xx:xx
Create Date                     : 2014:01:14 09:xx:xx

It appears their exif data is slightly different. The systematic pattern is that the files that did not move correctly were taken with the "Pudding camera" app, which looks to generate slightly different data. Those photos were also taken from a Droid 2 but the correct photos form a Galaxy S3.

alextran1502 commented 7 months ago

Are all the jobs finished yet? especially the extract metadata job?

stevenwalton commented 7 months ago

Wow, impressive response time! (Thanks for the project too!)

I believe so but we can give it an hour to check and I can also reboot the container.

Looking at my logs I see a bunch of lines such as

immich_microservices       | [Nest] 7  - 11/19/2023, 10:xx:xx PM     LOG [MediaService] Successfully generated WEBP video thumbnail for asset 4be68fbc-4931-408f-959a-4251e32e4fc3

Further up there are errors like this

immich_microservices       | [Nest] 7  - 11/19/2023, 9:xx:xx PM   ERROR [StorageTemplateService] Error: Connection terminated due to connection timeout
immich_microservices       |     at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73)
immich_microservices       |     at Object.onceWrapper (node:events:628:28)
immich_microservices       |     at Connection.emit (node:events:514:28)
immich_microservices       |     at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:63:12)
immich_microservices       |     at Socket.emit (node:events:514:28)
immich_microservices       |     at TCP.<anonymous> (node:net:337:12)
immich_microservices       | [Nest] 7  - 11/19/2023, 9:xx:xx PM   ERROR [StorageTemplateService] Object:
immich_microservices       | {
immich_microservices       |   "id": "efc899e2-debc-447d-a72a-081fda2d074a",
immich_microservices       |   "oldPath": "upload/upload/bf823dda-38c1-472f-971e-f74cd0ba3e75/13340531-fa92-4b0b-a64e-4402330ab2a6.jpg",
immich_microservices       |   "newPath": "upload/library/admin/2014/2014-01-14/20140114_11xxxx+1.jpg"
immich_microservices       | }
immich_microservices       | 

But these are for the images that actually got placed in the correct location. I also have some errors about the ML jobs erroring out but I'm not too worried about those. I can get them if you want (it's only classified two faces and two images. Server's been up for over a week but initial upload of images had the ML container shut down).

I grepped for the year because exact date couldn't find anything and there was too much looking by hand

endurance@endurance:~/immich $ docker-compose logs --follow | grep "2012"
...
immich_microservices       | [Nest] 7  - 11/19/2023, 10:01:07 PM     LOG [StorageCore] Attempting to finish incomplete move: upload/upload/bf823dda-38c1-472f-971e-f74cd0ba3e75/f40f67cc-d98c-41ad-9780-efbd85e19a35.jpg => upload/library/admin/2023/2023-11-19/2012041619xxxx.jpg

This is the only message that is associated with the previous image. There are only two of these messages and others have errors, here's an example. From the above command, they all have this same form

immich_microservices       | [Nest] 7  - 11/16/2023, 12:01:07 AM   ERROR [JobService] Unable to run job handler (thumbnailGeneration/generate-webp-thumbnail): Error: Input file is missing: upload/library/admin/2022/2022-12-13/Screenshot_20221213-20xxxxng
immich_microservices       | [Nest] 7  - 11/16/2023, 12:01:07 AM   ERROR [JobService] Error: Input file is missing: upload/library/admin/2022/2022-12-13/Screenshot_20221213-20xxxx.png
immich_microservices       | [Nest] 7  - 11/16/2023, 12:01:07 AM   ERROR [JobService] Unable to run job handler (thumbnailGeneration/generate-webp-thumbnail): Error: Input file is missing: upload/library/admin/2022/2022-12-13/Screenshot_20221213-20xxxx.png

It's only two files actually and both have leading "Screenshot" but there are 9 instances of the error. This is all I see grepping for the 2012 year.

Clearly my server's clock is incorrect so ignore the other time issue above about being 9pm.

Edit: guess the classifier is processing these images since it is updating (but also not important here)

stevenwalton commented 7 months ago

Following up, I restarted the servers and gave it a try. I let them sit for awhile (30 mins ish). Seeing no actions being produced in the logs and power draw and top output normalized, so I shut down and updated. So I am now on server v1.87.0. No logs are being produced other than the actions I perform and the problem persists.

Is there a way to force a recheck? This would also be useful as I do have some photos that did not properly upload. This is likely unrelated but just trying to provide additional info. Similarly a likely unrelated issue is that some files did upload that were not from the google photo folders I selected, such as album art. I understand there is some update mechanism but I am unsure what the proper call for this is.

stevenwalton commented 7 months ago

Okay, so I looked at this problem a bit more and I think I figured out what's going on. I'm pretty sure Immich is reading the wrong exif tag. This is a rather weird thing that I found, but looking at my sidecar data I find that the only valuable metadata is the description, album name, and people that are tagged in photos (may be nice to integrate with the face identification and you can automatically tag people by name!). GPS data and creation time data is wrong.

The bigger issue, is that looking at exif data in the actual image itself there's multiple versions of some tags, specifically time. I believe Google is prepending data and so it just stacks up. But we have unique tags that do have the correct value and so I believe those should take priority if they're found.

Let's see an example

Some data removed for privacy or readability (there are >120 tags!)

$ exiftool PXL_20231129_043736033.jpg 

ExifTool Version Number         : 12.70
File Name                       : PXL_20231129_043736033.jpg
Directory                       : .
File Size                       : 1763 kB
File Modification Date/Time     : 2023:11:30 20:39:18-08:00
File Access Date/Time           : 2023:11:30 20:39:21-08:00
File Inode Change Date/Time     : 2023:11:30 20:39:19-08:00
File Permissions                : -rw-r--r--
File Type                       : JPEG
File Type Extension             : jpg
MIME Type                       : image/jpeg
Exif Byte Order                 : Little-endian (Intel, II)
Make                            : Google
Camera Model Name               : Pixel 6 Pro
...
Modify Date                     : 2023:11:28 20:37:36
...
Date/Time Original              : 2023:11:28 20:37:36
Create Date                     : 2023:11:28 20:37:36
Offset Time                     : -08:00
Offset Time Original            : -08:00
Offset Time Digitized           : -08:00
....
GPS Time Stamp                  : 04:37:30
GPS Date Stamp                  : 2023:11:29
...
Profile Date Time               : A long time ago in a galaxy far away
...
Create Date                     : 2023:11:28 20:37:36.022-08:00
Date/Time Original              : 2023:11:28 20:37:36.022-08:00
Modify Date                     : 2023:11:28 20:37:36.022-08:00
GPS Altitude                    : ...
GPS Date/Time                   : 2023:11:29 04:37:30Z
...

So we'll remember that Google names the files based on the UTC timestamp (which is why I redacted in my first message). So PXL_20231129_043736033.jpg => 29 Nov 2023 at 4:37:73 UTC, I'm in PST (indicated by the -08:00) so that's 20:37. We see that the first instance of the tags Modify Date,Date/Time Original, and Create Date are suspiciously close to the time of this comment and in local time (and indicating the timezone information). Times might be similar but that's coincidence, notice the date. We probably shouldn't trust these ones. But these same tags along with GPS Date/Time are the actual time the photo was taken and correspond to the correct time (GPS has Z, so presumably Zulu, and the small difference may just be GPS drift). We'll ignore GPS date time though because a user may not have that enabled. We can't rely on GPS data though because if you don't have it on then it won't be included.

Going through a bunch of files I find that in every case it was always the last instance of Modify Date,Date/Time Original, and Create Date that was correct and the first instance was not guaranteed. I mentioned earlier that they prepend, and the belief for this is that this exactly corresponds with the time I uploaded that photo to Google photos, which is why it's so close to this comment's time.

TLDR: use the last instance of the Date/Time Original tag and you're all good.

Edit: unrelated side note, google seems to have updated the motion picture file extension to "MP". Exif shows the mime type is video/mp4 and if you change the file extension they read just fine. Idk how Immich handles this, but it seems the future proof way would be to rely on exif data instead of extensions because it looks like (via github search) you whitelist the extensions.

Batwam commented 7 months ago

Rather than using the "last" instance, you probably want to find the accurate tag name (rather than description) you want to rely on since multiple tag appears to have Date/Time Original in their description.

What is the full name of the tag you'd want to use ("last" tag) if you do exiftool -s -time:all PXL_20231129_043736033.jpg?

stevenwalton commented 7 months ago
$ exiftool -s -time:all PXL_20231129_043736033.jpg

FileModifyDate                  : 2023:11:30 20:39:18-08:00
FileAccessDate                  : 2023:11:30 21:49:18-08:00
FileInodeChangeDate             : 2023:11:30 21:40:08-08:00
ModifyDate                      : 2023:11:28 20:37:36
DateTimeOriginal                : 2023:11:28 20:37:36
CreateDate                      : 2023:11:28 20:37:36
OffsetTime                      : -08:00
OffsetTimeOriginal              : -08:00
OffsetTimeDigitized             : -08:00
SubSecTime                      : 022
SubSecTimeOriginal              : 022
SubSecTimeDigitized             : 022
GPSTimeStamp                    : 04:37:30
GPSDateStamp                    : 2023:11:29
ProfileDateTime                 : 2016:12:08 09:38:28
SubSecCreateDate                : 2023:11:28 20:37:36.022-08:00
SubSecDateTimeOriginal          : 2023:11:28 20:37:36.022-08:00
SubSecModifyDate                : 2023:11:28 20:37:36.022-08:00
GPSDateTime                     : 2023:11:29 04:37:30Z

So CreateDate looks good here. But I'm trying to think of this in general because Immich needs to not just read google photos, but many others. In the other issue that referenced this one, I picked a random imgur image that didn't have integers in the name. Let's do the same

$ exiftool -s -time:all Downloads/jEkVxub.gif

FileModifyDate                  : 2023:11:30 23:03:57-08:00
FileAccessDate                  : 2023:11:30 23:03:58-08:00
FileInodeChangeDate             : 2023:11:30 23:03:57-08:00

That does correspond to me downloading right now but you can see that they are different tags. So it makes sense that you would pull time data from something like FileModifyDate but if we used that tag for the Pixel image we'd get the wrong date, which is exactly what happened for the original issue. I'm not going to pretend to be an expert on this and I'm sure there's better solutions, but to me this is indicating there needs to be a hierarchy of tags that take precedence over another because there's clearly no universal option. You can't even take the earliest date since the profile date is way off. Which kinda sucks, but it very much explains the problem that is the reason for this issue in the first place.

(I'm trying to see if I have a picture from an iPhone somewhere and I'll edit the comment at that time) Edit: Found a photo from an iPhone but downloaded from Google photos. It has all the same tags as the pixel photo except it is missing the GPS tags and SubSecTime (and the profile time has a dummy value). CreateDate looks to be the correct one but again is in local time.

Let's try to get on topic though: What is Immich using?

Batwam commented 7 months ago

If I'm not mistaken, it's specified in row #30 of this file

/** look for a date from these tags (in order) */
const EXIF_DATE_TAGS: Array<keyof Tags> = [
  'SubSecDateTimeOriginal',
  'DateTimeOriginal',
  'SubSecCreateDate',
  'CreationDate',
  'CreateDate',
  'SubSecMediaCreateDate',
  'MediaCreateDate',
  'DateTimeCreated',
];

Is this prioritisation consistent with what you are seeing?

skynetua commented 7 months ago

Is it possible to not change or at least store somewhere the following dates after uploading?

File Modification Date/Time 
File Access Date/Time 
File Inode Change Date/Time 

When photo from messenger is uploaded to imich it shown at date when was received, but after downloading it back from imich it losts this date and shown as today. Google photo has this issue as well.

Found #3900

stevenwalton commented 7 months ago

@Batwam , Sure metadata.service.ts:30 has some information but its not addressing the tags at hand. AND it is not a prioritization list, it is just an array. It's used here and here.

There doesn't appear to be logic that matches DateTimeOriginal and Date/Time Original either. And what The File dates aren't even there. And the lines above look like they're prioritizing sidecar data which my investigation shows to be utterly unreliable. Fwiw, until I switched over to Immich I had a more privacy maximal settings, and I do not think this is an unreasonable belief to assume many other users will be in a similar boat. Sidecar seems only reliable for: Album names, tagging of auxiliary information (such as people), photo description, and view count. It is not consistent with date time nor GPS (I've never been to Null Island but Immich sure thinks I have).

As @skynetua is pointing out, there are plenty of weird issues going on and you can find a lot of open issues with datetime. Certainly we should be at least supporting whatever Google Photos and iCloud are doing. But it also seems to be an evolving environment and as is typical, anything to do with time is just a mess. My domain expertise is not really going to help here (I'd be better providing support on the ML side) so all I can do is report what I see and make a suggestion that should really only be a launching point not a solution (better suited for someone that actually understands all this exif mess haha)

psyciknz commented 2 days ago

I believe there is a similar issue when using XMP files. Where they only contain a create/modify date. An asset (upon metadata refresh), then takes the XMP file date.

I think there needs to be the ability to prioritise a date field. Ie a user can select that over everything else, the Exif date taken or Exif DateTimeOriginal takes priority over any other date. Only if that is missing should a file date be used.

It was discussed on discord here: https://discord.com/channels/979116623879368755/1257530194827411517 And rather than create a new issue as it does seem to be the same issue that is described here (date field priority). I chose to append to here. I can create new issue if preferred.