vale46n1 / immich_duplicate_finder

A Comprehensive Solution for Identifying and Managing Duplicate Photos in Immich
Apache License 2.0
184 stars 12 forks source link

After deleting an asset, all assets are searched for duplicates. #21

Closed JW-CH closed 4 months ago

JW-CH commented 4 months ago

Problem: When duplicates were found, and I delete a duplicate, it searches for new duplicates which takes some time (26k assets, 16k indexed)

This makes the process of deleting duplicates really slow, why is it needed to search for duplicates again?

image

vale46n1 commented 4 months ago

Fully agree with u. This is the standard behavior of Streamlit...this is the reason why I would eventually like to move on a different frontend like React. Let see if this night I can implement a way to improve the situation with streamlit implementing a status to keep what visualized and remove only the pair without reload everything. I guess should be feasible...

vale46n1 commented 4 months ago

Process now appears to be more streamlined: since the creation of the vector database is a one-time setup, which is great as it simplifies updates with new photos.

Once the vector database is set up, you can create the duplicate database.

This separation of concerns ensures that each component is focused and manageable.

The function to find duplicates in the database seems efficient. By limiting the search results to 10 elements, the performance is optimized, making the operations both smooth and simple.

Let me know if you have other suggestion

ntropia2 commented 4 months ago

I landed here looking for a solution to my problem.

When selecting photos for deletion (about 4000), I can only pick the first one, get the confirmation that the asset has been deleted (both in the web interface and in the terminal where the server is started), but then if I click on a second photo, all remaining duplicates disappear.

The only think I can do is to click again on "Find duplicate photos" and wait.

The same behavior occurs on mobile browser and on desktop.

Is this the same issue? Is there a solution?

f-mc2 commented 4 months ago

I landed here looking for a solution to my problem.

When selecting photos for deletion (about 4000), I can only pick the first one, get the confirmation that the asset has been deleted (both in the web interface and in the terminal where the server is started), but then if I click on a second photo, all remaining duplicates disappear.

The only think I can do is to click again on "Find duplicate photos" and wait.

The same behavior occurs on mobile browser and on desktop.

Is this the same issue? Is there a solution?

I have the exact same problem, and it makes the procedure almost impossible when you have around 7000 duplicates.

I hope there will be a way to solve the issue.

vale46n1 commented 4 months ago

Sure. I'm developing a new version that is solving this behavior. Moreover I'll implement a video duplicate function extremely powerfully as well.

Il dom 21 apr 2024, 02:05 Florio M. Ciaglia @.***> ha scritto:

I landed here looking for a solution to my problem.

When selecting photos for deletion (about 4000), I can only pick the first one, get the confirmation that the asset has been deleted (both in the web interface and in the terminal where the server is started), but then if I click on a second photo, all remaining duplicates disappear.

The only think I can do is to click again on "Find duplicate photos" and wait.

The same behavior occurs on mobile browser and on desktop.

Is this the same issue? Is there a solution?

I have the exact same problem, and it makes the procedure almost impossible when you have around 7000 duplicates.

I hope there will be a way to solve the issue.

— Reply to this email directly, view it on GitHub https://github.com/vale46n1/immich_duplicate_finder/issues/21#issuecomment-2067819308, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIY6VPLDVJ7M2PD46KSHFCLY6L7FNAVCNFSM6AAAAABGLM726KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXHAYTSMZQHA . You are receiving this because you commented.Message ID: @.***>

f-mc2 commented 4 months ago

Sure. I'm developing a new version that is solving this behavior. Moreover I'll implement a video duplicate function extremely powerfully as well. Il dom 21 apr 2024, 02:05 Florio M. Ciaglia @.> ha scritto: I landed here looking for a solution to my problem. When selecting photos for deletion (about 4000), I can only pick the first one, get the confirmation that the asset has been deleted (both in the web interface and in the terminal where the server is started), but then if I click on a second photo, all remaining duplicates disappear. The only think I can do is to click again on "Find duplicate photos" and wait. The same behavior occurs on mobile browser and on desktop. Is this the same issue? Is there a solution? I have the exact same problem, and it makes the procedure almost impossible when you have around 7000 duplicates. I hope there will be a way to solve the issue. — Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIY6VPLDVJ7M2PD46KSHFCLY6L7FNAVCNFSM6AAAAABGLM726KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXHAYTSMZQHA . You are receiving this because you commented.Message ID: @.>

That's amazing! Thank you for the wonderful work!

I do not know if it makes sense to open another issue, but I just thought that, in addition to be able to delete duplicates of different assets without refreshing, it could also be helpful to have different duplicates of the same asset to be displayed at the same level. Let me know if it's advisable to open another issue and I'll do it.

ntropia2 commented 4 months ago

Sure. I'm developing a new version that is solving this behavior. Moreover I'll implement a video duplicate function extremely powerfully as well. Il dom 21 apr 2024, 02:05 Florio M. Ciaglia @.> ha scritto: I landed here looking for a solution to my problem. When selecting photos for deletion (about 4000), I can only pick the first one, get the confirmation that the asset has been deleted (both in the web interface and in the terminal where the server is started), but then if I click on a second photo, all remaining duplicates disappear. The only think I can do is to click again on "Find duplicate photos" and wait. The same behavior occurs on mobile browser and on desktop. Is this the same issue? Is there a solution? I have the exact same problem, and it makes the procedure almost impossible when you have around 7000 duplicates. I hope there will be a way to solve the issue. — Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIY6VPLDVJ7M2PD46KSHFCLY6L7FNAVCNFSM6AAAAABGLM726KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXHAYTSMZQHA . You are receiving this because you commented.Message ID: @.>

That's good news! I know this is an open source project and everything, but I would suggest that having a usable state to begin with is a more important goal, and that should come before adding new features. For example, before adding video support, it would be infinitely more useful to have a way to export the list of just photos in any format (JSON? CSV?) for further processing processing, or having a simple "delete smaller/larger file" button. That's not because usually photos outnumber videos 20:1, but also to set this as a powerful and useful tool for the community.

Thanks for this great effort! Ciao

vale46n1 commented 4 months ago

Fully agree. For that reason, I would move to react that is much more powerful then streamlit

vale46n1 commented 4 months ago

I've just implemented a new state to prevent the vector database from reloading. Please let me know if it's working properly. Regarding the new function, feel free to submit your ideas in the discussion session. This way, I can review and potentially implement them. I'd be happy to do so!

ntropia2 commented 3 months ago

I'm not sure where is the discion session, I couldn't find any mention to it in the repo.

I think this new version is a great improvement toward usability. There is still a lot of clicking necessary but I guess we have to wait patiently until you'll migrate to the new framework.

That said, there are a few low-hanging fruits you could implement and possibly keep even after the migration.

First, for the sake of a sane UX experience, it is frustrating to have no consistent placement for the the smaller/larger files flipping left and right. Combined with the tedious process of having to click several thousand times to delete duplicates makes the whole process very error prone (I accidentally deleted the wrong file way too many times).

Then, if you combine the ML similarity score with the comparison of some of the EXIF data you could identify many obvious duplicates that could be deleted without supervision. For example, I noticed that many duplicates come from importing the same pictures from different Google Takeout sessions or mixing that with direct Google Photos downloads. This can generate files that are often pretty much identical (e.g., exactly the same timestamp to the second), but they can have different size. Depleting the smallest ones should be a no-brainer that could be automated without too much fuss and speed up the process dramatically.

Happy to report these features also in the discussion forum, if you can point me to it.

f-mc2 commented 3 months ago

I've just implemented a new state to prevent the vector database from reloading. Please let me know if it's working properly. Regarding the new function, feel free to submit your ideas in the discussion session. This way, I can review and potentially implement them. I'd be happy to do so!

I cloned the updated repo, and used the old database, but the issue seems to persist. Am I doing something wrong? Should I proceed differently?