simulot / immich-go

An alternative to the immich-CLI command that doesn't depend on nodejs installation. It tries its best for importing google photos takeout archives.
GNU Affero General Public License v3.0
1.47k stars 45 forks source link

Question: report shows way less images uploaded than scanned #390

Closed joselsegura closed 1 month ago

joselsegura commented 1 month ago

Hi immich-go team!

I discovered you a few days ago and I was using your software to upload my Google Photos backup to my Immich instance. I downloaded 3 Google Takeout tgz archives from Takeout service and uncompressed them as the README and instructions say in order to run immich-go over them.

I put all the 3 directories altogether inside the same directory and run immich-go -server https://XXXXXXXXX -key ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ upload -create-albums -google-photos ..

After a while, the upload was finished and the report was shown:

Input analysis:
---------------
scanned image file                      :   48682
scanned video file                      :    1835
scanned sidecar file                    :   48483
discarded file                          :       0
unsupported file                        :     105
file duplicated in the input            :    4168
associated metadata file                :   25771
missing associated metadata file        :   24746

Uploading:
----------
uploaded                                :   20749
upload error                            :       2
file not selected                       :       4
server's asset upgraded with the input  :       0
server has same asset                   :     844
server has a better asset               :       2

As you can see, it scanned more than 50k photos+videos, but the report says that "only" ~21k were uploaded to my server. Did I something wrong on my run? Should I try to repeat the execution and expect something different or is it expected?

simulot commented 1 month ago

tgz The direct support of the TGZ format imposes the read and then decompress the archive twice.... Better to process the result of the decompression

scanned more than 50k photos+video but only ~21k were uploaded

First, the google takeout is full of duplicates... That explains a part.

But also Immich-go is confused by the Iphone file names that create duplicates. IMG_1234.HEIC appears several times in the archive... and sometime confused with IMG_1234.JPG and IMG_1234_MP4....

I'm working on the issue.

You can help by providing logs and debug files. For more privacy, you can sent them via discord @simulot

immich-go -server https://XXXXXXXXX -key ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ upload -debug-counters -create-albums -google-photos 

Stay tuned

joselsegura commented 1 month ago

Hi!

I didn't use the direct tgz support, I decompressed the files and run immich-go over the decompressed directory.

I was doing some numbers using fdupes to find the duplicates and it makes sense. I don't have any iPhone, so I don't have the HEIC files problem at al.

simulot commented 1 month ago

I'm rewriting the google photo import. I can work on your logs if you agree.

joselsegura commented 1 month ago

Sure, I don't have any problem sharing my logs with you. They are huge as I have a ton of pics there...

simulot commented 1 month ago

You also can share the list of your files, not the content. Run following command

for f in *.zip; do echo "$f: "; unzip -l $f; done >list.lst
cocoands commented 1 month ago

@simulot Let me know if you need another list of file names. I had a very similar experience on v 0.20.1.

Input analysis:
---------------
scanned image file                      :    2832
scanned video file                      :      89
scanned sidecar file                    :     769
discarded file                          :       0
unsupported file                        :       0
file duplicated in the input            :       0
associated metadata file                :       7
missing associated metadata file        :    2914

Uploading:
----------
uploaded                                :       4
upload error                            :       0
file not selected                       :       0
server's asset upgraded with the input  :       0
server has same asset                   :       3
server has a better asset               :       0
simulot commented 1 month ago

The report shows 2914 photos not associated with a Json file.

This is not usual. Those files are ignored.

There are 2 main causes:

  1. You have processed only one part of the takeout. Use takeout-*.zip.
  2. These JSON files are missing from the takeout. Ask for another takeout.

The next version of immich-go will give those advices.

cocoands commented 1 month ago

You are absolutely right. Complete user error. I needed to pay closer attention to the Google Photos advice in the README. takeout-*.zip. Thanks for the quick response and thank you for building this tool!