Closed toxic0berliner closed 12 months ago
I think this is not the intended use case. The external library is used for existing libraries while uploading assets will go into the default library.
Damn, it would be a bit sad if that's the case. I trashed my previous install... 50k pictures, with many faces, takes over 3 days to scan and several weeks to ignore the over 40k faces and rename all my friends....
I'm really not sure I'm ready or even should move everything to the immich primary library... Really difficult to add the dedup algorithm to external libraries ?
It was working fine in the pas with my custom script that imported into the library with an external path... But that started to fail as well mid September (not importing new ones) so I thought external library would be best.
Even if I were to switch to immich as primary app including for backup, I have over 250GB of pictures on my phone, not really looking forward to moving it on my NAS from where they are to immich....
I tried to not grant the permission to use Android pictures but the app keeps asking, so can't use external library at all as long as there is any overlap with the content of the phone, can't use the app without it seeing local android pictures... Makes it unusable for me. I'm thankfully not your only user and you don't really need me, sure, but I fail to see why external library really shouldn't be treated as the primary library in case the picture on the phone is already on the server in an external library...
Was liking the face recognition, places, timeline, overall swiftness of the UI. I can't believe I'm the only one with such need but I'm also not ready to fork or PR to fix it as I'm a bad dev, so I hope I can convince you 😁
I am not sure what you are trying to achieve, from my POV you can
library
feature.I have 250gb of pictures already on my phone and already on the NAS where I run immich. Just trying to use Immich and not move all my existing pictures. The NAS also store some pictures and movies that I remove from my phone since then. So ideally I'd import all existing files AND enable backup, all to the primary library, but that would mean moving or duplicating over 500gb of pictures and videos...
So I'd really like instead to keep the existing files where they are, not enable the backup as the one I already have works fine, but still be able to use Immich to see and analyse all my pictures and be able to share them with friends.
This is why I would need the external library AND the android photos to work together and not show up twice, else I'll not use Immich on my phone, not invest time in "maintaining" it and ultimately it'll end up fully unused.
I think this is an important issue. I am also experiencing it (while loving Immich overall!) and fully agree.
I'm sure many people possess duplicated photos in their external libraries for a variety of reasons. Some of those reasons may be vestigial or even superfluous. In my personal case, even the result of laziness.
Obviously, there are other deduplication methods that could take care of things like the duplicated folders. But for people with larger photo collections (mine is ~100k), that is a lot to manage and go through. I love the idea of having the Immich UI put all the photos into a timeline for me without too much intervention. It is working so incredibly well!!
As I have pointed out (https://github.com/immich-app/immich/discussions/4240#discussioncomment-7180105) I think there is a relatively simple solution to this: don't display two images in the timeline that share the same file checksum. Why would this ever be the desired behavior? If they are identical images, then I am confident that no one would want them displayed adjacent to each other in the timeline. If there are reasons someone would want this, I am very curious to hear it.
How could a solution be implemented? I propose that they could either be considered a type of 'stack' (i.e., keep the assets tracked separately, but displayed as one), or alternatively, subjected to the same checksum searching that already applies to the "Upload" library (i.e., consider it a single asset). The former option could give users more flexibility, the latter may be easier to implement.
I love Immich and hope to continue using it! I really feel strongly about this though. I would be willing to help out with a PR, although the learning curve would be really steep for me as I am not familiar with the languages used in Immich.
Thanks for everyone's continued efforts on this amazing project!!
Libraries don't currently use checksums since they are the "source of truth" and there is a significantly negative performance impact to generating hashes on large libraries. Even if we had them, checksums have to be unique in the database and now you still have the complexity of managing what file do you keep and which one do you ignore, how do you manage that on rescan or file moves, etc. There are also priorities for libraries like automatic album creation. I guess long story short, probably not going to be addressed anytime soon and you are better off using a proper dedupe tool instead.
Ahh, thanks for the insight and taking the time to reply.
So if I understand correctly, the upload library is specially designated to calculate the sha1 hash for the assets in it, but external libraries are not.
The part I am not understanding is how the resources required would be any different if I uploaded 100k photos from my phone. If I did this, hypothetically, the hashes would be calculated and presumably recorded in the db. But this isn't possible for the external libraries?
And I guess what you are saying about being unique in the database means that two assets cannot share a checksum because it's a primary key. This makes sense*. I suppose it would make sense to me intuitively that two duplicate photos (with the same checksum) could be represented by a single asset in the database (since it essentially is). Perhaps it would also start to violate other rules about fields in the database - e.g., can't have more than one file path per asset, likely? I can see how problems would start to pile up.
I can also definitely understand that people running this on a raspberry pi wouldn't find it desirable to run checksum calculations for days on end.
I'm curious how photoprism implements this feature (https://docs.photoprism.app/user-guide/library/duplicates/ - they are checking sha1 for every file on import to detect). It is one of the few things it does better - automatically stacking assets when it makes sense to do so (i.e., raw + jpg version; identical images; etc.). I understand this is getting outside the scope of what Immich was designed to do. It's just that it's so awesome at doing everything else it is so tempting to integrate this feature.
I also share @toxic0berliner's concerns regarding dropping other backup methods. I am currently using Nextcloud for auto backups from mobile. I would be happy to lose this method, but it works and is stable for now. So, perhaps something for the future.
I get the impression that there are many users facing the same issue though, because a lot of people are going to be using external libraries like this, and many people WILL have duplicates as I've described, and many will have other methods of backups too. I'm not trying to put more on the current developers' shoulders, just sharing my experience.
I still come back to the same question: why would any user want duplicate images sharing a sha1 hash displayed in the timeline? It seems as simple (ha... I know, is it ever simple) as offering the option to calculate hashes; recording it in a table in the database; and picking one as the primary asset to display and generate thumbs for (the first one by mtime? literally doesn't matter).
*(EDIT: actually I'm not sure anymore how this is possible, because I do have duplicates in the timeline, meaning they would have the same hash... I obviously do not have a good grasp of how this is all working in the back end, although it's clear that hashes are not calculated for both duplicates)
External libraries are quite different than upload libraries and we have separate implementations, which reflect each use case.
Upload libraries have immich as the source of truth and it manages creating and deleting files and deduping them.
External libraries have the file system as the source of truth and so we leave creating, deleting and deduping files to the user. Deduping has different semantics in this context and the implementation would be quite different. We realized that by not having hashing it is significantly faster to import an external library, so we didn't add it.
It is not to say hashing and other deduping cannot be done, it is more that it is not trivial as it seems and specifically because there were benefits to excluding it (simpler implementation) we didn't include it originally.
Checksum is a required field, but the value for external library files is just a hash of the file path instead.
I don't think any user wants duplicates in their external libraries, but they do want external libraries and they got them sooner at the expense of no dedupe checking.
Totally fair! Happy to have it, because that is what drew me in as a user.
Pre-existing duplicates I agree are a separate problem with no easy answer. It was just a surprise that backing my photos up through a separate mechanism (which is indeed recommended upfront in the documentation) results in duplicate uploads from the mobile app to my library.
I would be really interested to learn more about how the current implementation works to check duplicates against images in the upload_location
but the code base is massive and I didn't have any luck trying to search on my own. Any pointers on where to look?
Side note: why not use md5 rather than sha1 since it's a bit less computationally expensive? (EDIT: I guess the speed is fairly comparable, but you get more bits from sha1...)
Pre-existing duplicates I agree are a separate problem with no easy answer. It was just a surprise that backing my photos up through a separate mechanism (which is indeed recommended upfront in the documentation) results in duplicate uploads from the mobile app to my library.
Honestly, there seem to be two main types of users using Immich right now:
Immich was originally designed to work exactly like google photos. With google photos you don't have an option 2 available in the first place. But, there are lots of people looking for self-hosted photos with use case 2 in mind, so libraries was added (after the fact) to accommodate that user group. Upload libraries are really for group one and external libraries are really for group two.
While we want to support more use cases, photo management software is indeed complicated. I'd say, currently at least, using the upload library and the external libraries in tandem in not a great experience and I think most people are only using one or the other right now. I'm sure it will improve in the future, but it is a current limitation. It's still unclear exactly how they should/will be integrated in the future. There are talks of migrating "partner sharing" to be library based and other stuff like that.
I would be really interested to learn more about how the current implementation works to check duplicates against images in the upload_location but the code base is massive and I didn't have any luck trying to search on my own. Any pointers on where to look?
Side note: why not use md5 rather than sha1 since it's a bit less computationally expensive? (EDIT: I guess the speed is fairly comparable, but you get more bits from sha1...)
Long story short, it is the version Alex picked when he started building, probably because he is not a crypto expert and just made a decision and moved on. By the time more contributors started working on the project sha1 was already widely incorporated into the project and it would take a bit of effort to migrate to another algorithm. The benefits of migrating simply was not worth the time and effort. Basically, migrating has minimal impact on the users of the system, but delays other more critical features that we've decided to build instead. So like, do you want to migrate to md5 or get a better search system, a stacked photos implementation, a more robust dedupe implementation, automatic albums for external libraries, etc. We've decided those features are more important than the algorithm we use for hashing. Sha1 is pretty performant still and on some machines is a single cpu instruction.
Thanks so much for all the details. I really appreciate you taking the time! I understand the nuances a lot better now.
I would place myself somewhere between 1 and 2... I do want Immich to be my mobile backup & organization/UI/sharing solution (i.e., a replacement for google photos, obviously), but I also have a large collection of photos, and like the granularity of being able to provide various volumes across various physical locations and not worry about it destroying my collection while the app is in development. I would happily enable a longer processing time to have duplicate detection (but, I have a reasonably powerful server to do this, which many users might not).
I guess the solution for me is to disable the Immich mobile upload entirely until there is progress on this front and rely on 3rd party tools, then clean up the existing duplicates as required, which is easy enough to do (well worth the effort to keep using the excellent application). I suppose that will work, thanks for helping me reach that conclusion - hopefully this discussion helps others too.
I'm happy to continue the discussion if I think of anything productive.
Thanks, @mattjmeier and @jrasm91, for a very productive conversation.
I think that sounds like a good solution in the interim while we continue to work out the kinks around libraries and figure out how to tackle your use case. Thanks for being understanding as well, it is refreshing :pray:.
I think adding an optional feature for "library hashing" could be something we look at in the future as well.
Conver to discussion/feature request this as this is not a bug but the current intention. Future optimization might address this issue
The bug
Given the warning to not use immich as the sole backup app for your pictures, I am still using an external app that backups all my pictures from my android phone to my NAS. I just moved from a custom importer script to the external library feature.
But now, immich is not able to recognize anymore that the same picture is on my phone and on the server. I get a duplicate for each picture, one with a cloud only icon for the one on the server, and one with a crossed cloud for the one on my phone.
In the past I used to get a proper deduplication with a single picture and a checkmark inside the little cloud icon.
Maybe something broke and external libs are not matched against the local android pictures ?
The OS that Immich Server is running on
Docker image running on ubuntu 22.04
Version of Immich Server
v1.81.1
Version of Immich Mobile App
1.80.0 build.104
Platform with the issue
Your docker-compose.yml content
Your .env content
Reproduction steps
Additional information
No response