immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
42.63k stars 2.08k forks source link

Move to client side hashing #1553

Open jrasm91 opened 1 year ago

jrasm91 commented 1 year ago

Feature detail

Before upload, compute a client side hash in the mobile app and use that (eventually in combination with #731) to determine if an asset should be uploaded.

Platform

Mobile App

mike-lloyd03 commented 1 year ago

As an alternative to a simple SHA hash of the file, I suggest using an algorithm which allows to detect for near duplicate photos: photos that are visually identical but would result in different hashes as a result of compression, resizing, or filetype conversion. I've used this library written in Go to create a duplicate image finder which could pick up duplicates between originals on my iPhone and those which had been through Google Photos' compression algo. Photoprism is also using this library to detect duplicates.

However, I'm not totally sure how something like this would be used to prevent the client from uploading an existing duplicate to the server as it doesn't generate a unique artifact like hashing does. But I figured it was worth mentioning while this is being considered.

bo0tzz commented 1 year ago

@mike-lloyd03 if we implement similarity detection that will be server side only. The current implementation is hash only. You can track #644 if interested in the fuzzy deduplication.

ikaruswill commented 1 year ago

As mentioned by @bo0tzz, I believe we should scope this issue to only duplicate detection rather than similar photo detection as it is a more foundational functionality of backup and sync.

Scenario The main scenario is when the phone is reinitialized, the Immich app loses its sync state and recognizes all photos as new photos.

Why this deserves priority

Own context

nijhawank commented 1 year ago

+1 for client side hashing + deduplicating similar (not identical) photos. Similar looking photos could be collapsed into a single one (similar to how a burst photo is shown) on iOS

smnhdy commented 10 months ago

Is there any update on this FR? This is a blocking point ofr my iOS device users, as they have 50k+ photos in their iclouds, and immich is trying to upload everything every time it's installed. This is after i manually imported all photos via CLI.

athornfam2 commented 10 months ago

Growing library of multiple family members with 10K photos at least combined. Hoping this comes out soon for iPhone users.

sgloutnikov commented 10 months ago

Noticed today on a fresh iOS install that the application properly detected photos that were already uploaded to the server, and the cloud checkmark appeared in the corner of the photos. That however didn't change the files to be sent to the server and the mobile application wanted to upload the full library to the server.

DX37 commented 9 months ago

Noticed today on a fresh iOS install that the application properly detected photos that were already uploaded to the server, and the cloud checkmark appeared in the corner of the photos. That however didn't change the files to be sent to the server and the mobile application wanted to upload the full library to the server.

Same thing on Android.

smnhdy commented 9 months ago

Strange... not the experience I get.

I reset my iPhone this week, and did a fresh install of 1.86 and it's now trying to upload all50k photos...

DX37 commented 9 months ago

Strange... not the experience I get.

I reset my iPhone this week, and did a fresh install of 1.86 and it's now trying to upload all50k photos...

Check for Duplicated Assets in Immich settings (local storage, I guess). The number of assets maybe growing while uploading...

smnhdy commented 7 months ago

I don't see any comments that this feature is on the roadmap at all.. is there any official word on this?

bo0tzz commented 7 months ago

This is definitely planned, we just don't have that many people working on the mobile app.

p7996619 commented 7 months ago

I'd like to add that it would probably also be better performance-wise to switch to a more modern hash algorithm that can run in parallel, e.g. BLAKE3 or XXH3