simulot / immich-go

An alternative to the immich-CLI command that doesn't depend on nodejs installation. It tries its best for importing google photos takeout archives.
GNU Affero General Public License v3.0
1.2k stars 36 forks source link

Duplicate function incorrectly removing images #209

Open deanillfeld opened 3 months ago

deanillfeld commented 3 months ago

Hello there,

I've found the duplicate function is detecting the same path as a duplicate resulting in the image being deleted completely.

If i immediately run the command again it finds another lot of duplicates which contain the same path.

immich-go.exe -server http://192.168.1.11:8081 -key xxxxxxxxx duplicate -yes
immich-go  0.13.1, commit 6fb63f511170f0c903525ca03197eb94a13ef0ab, built at 2024-03-26T13:38:09Z
Server status: OK
Get server's assets...17921 received
2 duplicate(s) determined.
There are 2 copies of the asset IMG-20200809-WA0025.JPG.JPG, taken on 2020-08-09T10:00:00+10:00
  delete IMG-20200809-WA0025.jpg 1024x768, 74.6 KB, /photos/upload/77941cec-6312-4781-824a-e8c8997f29cb/45/02/4502c7d4-93d9-4740-946e-6109d06ea1a3.jpg
  keep   IMG-20200809-WA0025.jpg 1024x768, 74.6 KB, /photos/upload/77941cec-6312-4781-824a-e8c8997f29cb/45/02/4502c7d4-93d9-4740-946e-6109d06ea1a3.jpg
  Asset removed
There are 2 copies of the asset IMG-20210410-WA0022.JPG.JPG, taken on 2021-04-10T10:00:00+10:00
  delete IMG-20210410-WA0022.jpg 768x1024, 78.7 KB, /photos/upload/77941cec-6312-4781-824a-e8c8997f29cb/9b/36/9b36c434-efd9-4392-b860-6b924f998437.jpg
  keep   IMG-20210410-WA0022.jpg 768x1024, 78.7 KB, /photos/upload/77941cec-6312-4781-824a-e8c8997f29cb/9b/36/9b36c434-efd9-4392-b860-6b924f998437.jpg
  Asset removed

I've had a look through a few logs but nothing jumps out to me as relevant. Let me know if you need anything.

simulot commented 3 months ago

Oops...

I don't see in the code an obvious flow. It looks like if immich server has listed the same photo twice. I need to do more tests

deanillfeld commented 3 months ago

I had a poke around /api/asset and then counted how many times i saw each originalPath. Didnt get anything over 1. Anything else i can do that would help you investigate?

simulot commented 3 months ago

I'll add some test to the code and see if I can understand how this occurs

LeeAStone commented 3 months ago

I'm also seeing this behaviour:

immich-go 0.13.2, commit 159e3819940075651cd966e6106f1a143bcaa364, built at 2024-04-01T20:53:35Z Server status: OK Get server's assets...42022 received 3325 duplicate(s) determined. There are 2 copies of the asset SHOP_0079.JPG.JPG, taken on 2003-12-26T05:59:45Z delete shop_0079.jpg 352x288, 83.6 KB, upload/upload/d51ae252-ffc7-4328-9073-3c1457934764/95/86/95867ddf-ea83-435b-be3b-f1cb05582e44.jpg keep shop_0079.jpg 352x288, 83.6 KB, upload/upload/d51ae252-ffc7-4328-9073-3c1457934764/95/86/95867ddf-ea83-435b-be3b-f1cb05582e44.jpg Proceed? [n]/y:

Is there anything you'd like me to do to help debug this?

SHOP_0079.JPG.JPG

seems pertinent. as the filename does not have a double .JPG at the end

However it does seem to be working legitimately for others:

There are 2 copies of the asset CIMG1196.JPG, taken on 2006-11-05T19:38:51Z delete CIMG1196.JPG 1600x1200, 106.5 KB, upload/upload/d51ae252-ffc7-4328-9073-3c1457934764/82/72/82720527-c7f0-4bb5-abf1-74874505f1da.JPG keep CIMG1196.JPG 2816x2112, 1.8 MB, upload/upload/d51ae252-ffc7-4328-9073-3c1457934764/96/8e/968e7b5b-c64c-4d5a-b860-f54886484b08.JPG Proceed? [n]/y: n There are 2 copies of the asset CIMG1197.JPG, taken on 2006-11-05T19:38:58Z delete CIMG1197.JPG 1600x1200, 52.5 KB, upload/upload/d51ae252-ffc7-4328-9073-3c1457934764/f7/45/f745427a-3da9-4e35-8826-7a374f46ce72.JPG keep CIMG1197.JPG 2816x2112, 482.2 KB, upload/upload/d51ae252-ffc7-4328-9073-3c1457934764/95/9e/959e7716-e9ff-4cc9-996a-a7fae0db4642.JPG Proceed? [n]/y: n There are 2 copies of the asset CIMG1198.JPG, taken on 2006-11-05T19:39:05Z delete CIMG1198.JPG 1600x1200, 235.7 KB, upload/upload/d51ae252-ffc7-4328-9073-3c1457934764/97/7a/977a49c9-c8b9-4aef-b699-dd7b62fee33a.JPG keep CIMG1198.JPG 2816x2112, 2.4 MB, upload/upload/d51ae252-ffc7-4328-9073-3c1457934764/3e/8e/3e8e060a-44c6-4c8e-a16c-ba7dba1d097c.JPG Proceed? [n]/y: n There are 2 copies of the asset CIMG1201.JPG, taken on 2006-11-05T19:39:23Z delete CIMG1201.JPG 1600x1200, 98.2 KB, upload/upload/d51ae252-ffc7-4328-9073-3c1457934764/41/02/4102d462-518d-4499-b8f8-7c84c32cc34a.JPG keep CIMG1201.JPG 2816x2112, 1.4 MB, upload/upload/d51ae252-ffc7-4328-9073-3c1457934764/04/04/0404d47f-04f4-47e0-862b-48f97a53dba4.JPG

I think the difference being is that the files that did not work probably don't have any embedded exif data.

I'll take a look at your code but I'm not much of a programmer.

simulot commented 3 months ago

I'm busy with the new user interface. I have seen similar cases in my tests. I'll investigate it soon

molnarti commented 1 month ago

same problem here.

There are 2 copies of the asset DSC_0002.JPG.JPG, taken on 2015-03-06T11:52:26+01:00
  delete DSC_0002.JPG 3104x1746, 1.7 MB, /import/OneDrive_Camera_roll/DSC_0002.JPG
  keep   DSC_0002.JPG 3104x1746, 1.7 MB, /import/OneDrive_Camera_roll/DSC_0002.JPG

Also this is far from an isolated case, majority of my identified "duplicates" look like this