gphotosuploader / gphotos-uploader-cli

Command line tool to mass upload media folders to your google photos account(s) (Mac OS / Linux)
https://gphotosuploader.github.io/gphotos-uploader-cli/
MIT License
678 stars 78 forks source link

Duplicate folder named albums created #192

Closed nicholak closed 4 years ago

nicholak commented 4 years ago

Describe the bug Subsequent uploads appear to create duplicate folder name albums where not all files from a folder upload in 1 batch.

To Reproduce My steps were 1) Configure config file: makeAlbums: enabled: true & use: folderName deleteAfterUpload: false 2) Perform 1st upload. All ok 3) After exceeding API limit, perform upload again after 20 hours 4) Albums of same name created with remaining files from each folder.

Expected behavior A single album created for each folder.

Screenshots Selection_006

Desktop & Client versions (please complete the following information):

Additional context I have a 100GB account and am nowhere near the limit

nicholak commented 4 years ago

BTW, this application is awesome! How can I support it?

rysi3k commented 4 years ago

I can also confirm duplicated albums ;(

pacoorozco commented 4 years ago

Hi @nicholak

Let me check the issue related with albums duplication... I have an idea about what is happenning but I'd like to test it before. Hopefully I'll be able later.

In regards to your kind offer to support the app...

Thanks

albertvaka commented 4 years ago

Just hit the same issue, is there a patch I can test?

By the way, thanks a lot for writing this great tool in the first place :D

albertvaka commented 4 years ago

I have written a super ugly hack to fix this. The code is crap and it's not meant to be merged back upstream, but it did the job of fixing the albums for me.

https://github.com/albertvaka/gphotos-uploader-cli

Some notes:

pacoorozco commented 4 years ago

Hi @albertvaka

Good job implementing this new command to deduplicate albums with the same name. I think we can use it in order to fix the already created albums, but this is not fixing the bug, right?

I mean, there's a race condition that makes gphotos-uploader-cli to create an album that already exist. In order to use your code, I think we should address the bug too.

albertvaka commented 4 years ago

Hi @pacoorozco,

This isn't supposed to fix the bug, but it fixes the affected (duplicated) albums once the bug has already occurred. My code is just a big hack and it's not meant to replace the actual fix. I shared it in case it was useful to more people, but I don't expect it to be merged into the project :)

Albert

pacoorozco commented 4 years ago

Hi I'm trying to reproduce this issue and I'm not able to do it.

I have used the script with 100 of different files against the same album name and it has worked perfectly. Do you have a way to reproduce it?

grumpyjames commented 4 years ago

I have also found that this happens.

I was uploading a batch of maybe 2,000 photos in 52 directories. During the upload, I encountered a couple of API limit exceeded errors, but only for a couple of photos. More than two albums had duplicate entries though, and they weren't restricted to those where a photo had breached an API limit.

You mentioned this might be a race: I don't know if this effects how you might be thinking about the problem, but in case it does, you should know that my upstream bandwidth is not huge - it probably takes 3-4 seconds for each photo to perform the upload.

pacoorozco commented 4 years ago

Hi @grumpyjames

Thanks for adding you experience, I'm trying to understand where the problem is. Would you answer two questions?

grumpyjames commented 4 years ago
  1. I found one of the albums had most of the photos, and the other would only have one or two photos in it.
  2. The album names did not exist before running the CLI
pacoorozco commented 4 years ago

@grumpyjames, Can you share with me the name of the albums?

grumpyjames commented 4 years ago

Yep :-). These were the names:

20190720-moare
20191030-nevern
20191122-antinho-and-old-goa
pacoorozco commented 4 years ago

It's really strange.. I've reviewed the code several times, I'm not able to see where the album duplication is possible... and I can't reproduce it on my environment... :cry:

albertvaka commented 4 years ago

I can reproduce it 100% with these 4 files: https://www.dropbox.com/s/7yp8xc9wt9m92xj/test.tar.gz?dl=0

Consistently, the job for the second file (a video, which is a bit larger than the rest of files... not sure if that's relevant though) creates a duplicate album (with a different ID than the other 3 files).

My config is:

  jobs: [
   {
      account: "albertvaka@gmail.com"
      sourceFolder: "/path/to/test/directory"
      makeAlbums: {
        enabled: true
        use: folderName
      }
      deleteAfterUpload: false
      includePatterns: [ "_ALL_VIDEO_FILES_", "*.jpg", "*.GIF", "*.MOV", "*.mov", "*.gif", "*.png", "*.jpeg", "*.JPEG", "*.JPG", "*.Jpg", "*.Jpeg", "*.PNG", "*.mpeg", "*.AVI", "*.MPG", "*.mpg", "*.3GP", "*.3gp", "*.mp4", "*.MP4", "*.webm", "*.WEBM" ]
      excludePatterns: [  ]
    },
  ]
pacoorozco commented 4 years ago

Thanks @albertvaka for the test case...

Unfortunately, It has worked without any issue. I mean 4 files were created inside one album :roll_eyes:

There is something different between our setups...

Thanks.

albertvaka commented 4 years ago

Maybe Albums.Create is eventually consistent? I found that after calling it, the next loop through all the albums done in AlbumByName doesn't contain the album that was just created (so, we create a new one). I have ~650 albums, so maybe that makes the problem happen more often for me?

If that's the case, a solution would be to cache the albums we create, and get them from the internal cache instead of fetching them from the server. This would also greatly speed up the program... since going through 14 pages of albums each time to find the album we just created is rather slow.

I'm using latest master, this is an output with --debug (timestamps added by me so you can see where it takes longer, plus GetOrCreateAlbumByName lines also added by me so you can see how a different ID is returned):

20-08-09 10:57:36 [debug]  Worker #2 is starting
20-08-09 10:57:36 [debug]  Worker #4 is starting
20-08-09 10:57:36 [debug]  Worker #1 is starting
20-08-09 10:57:36 [debug]  Worker #3 is starting
20-08-09 10:57:36 [debug]  Worker #5 is starting
20-08-09 10:57:37 [debug]  Getting OAuth token for albertvaka@gmail.com
20-08-09 10:57:37 [debug]  Token expiration: 2020-08-09 11:08:27.880597999 +0200 CEST
20-08-09 10:57:37 [info]   Found item to upload: /tmp/test/test9/IMG_20140227_215903.jpg
20-08-09 10:57:37 [info]   Found item to upload: /tmp/test/test9/RelampagoAkademy2013.webm
20-08-09 10:57:37 [info]   Found item to upload: /tmp/test/test9/Screenshot 2020-07-11 14.07.39.png
20-08-09 10:57:37 [info]   Found item to upload: /tmp/test/test9/Screenshot 2020-07-30 22.28.22.png
20-08-09 10:57:37 [info]   4 files pending to be uploaded in folder '/tmp/test'.
20-08-09 10:57:37 [debug]  Worker #2 processing: /tmp/test/test9/Screenshot 2020-07-30 22.28.22.png
20-08-09 10:57:37 [debug]  Worker #4 processing: /tmp/test/test9/IMG_20140227_215903.jpg
20-08-09 10:57:37 [debug]  Worker #3 processing: /tmp/test/test9/Screenshot 2020-07-11 14.07.39.png
20-08-09 10:57:37 [debug]  Worker #1 processing: /tmp/test/test9/RelampagoAkademy2013.webm
20-08-09 10:57:55 [debug]  GetOrCreateAlbumByName returns id = ADV8uHTVHpgw9h0ergzQtq1NlfrFPdook4piRoq4mDyUcXrajzasZ21lXd7C9DoFAFcSUJS_ICQb
20-08-09 10:57:57 [debug]  Successfully processing /tmp/test/test9/Screenshot 2020-07-30 22.28.22.png
20-08-09 10:58:12 [debug]  GetOrCreateAlbumByName returns id = ADV8uHSoCSDwakQ0rUSqi9iAeIdLUYhdk9813kVjsi9fgKL0TXJ08HBU5hEXmFYDUjAJXxL1dIL4
20-08-09 10:58:13 [debug]  GetOrCreateAlbumByName returns id = ADV8uHTVHpgw9h0ergzQtq1NlfrFPdook4piRoq4mDyUcXrajzasZ21lXd7C9DoFAFcSUJS_ICQb
20-08-09 10:58:14 [debug]  Successfully processing /tmp/test/test9/IMG_20140227_215903.jpg
20-08-09 10:58:15 [debug]  GetOrCreateAlbumByName returns id = ADV8uHTVHpgw9h0ergzQtq1NlfrFPdook4piRoq4mDyUcXrajzasZ21lXd7C9DoFAFcSUJS_ICQb
20-08-09 10:58:15 [debug]  Successfully processing /tmp/test/test9/Screenshot 2020-07-11 14.07.39.png
20-08-09 10:58:17 [debug]  Successfully processing /tmp/test/test9/RelampagoAkademy2013.webm
20-08-09 10:58:17 [info]   4 processed files: 4 successfully, 0 with errors
20-08-09 10:58:17 [debug]  Worker #1 is stopping
20-08-09 10:58:17 [debug]  Worker #2 is stopping
20-08-09 10:58:17 [debug]  Worker #3 is stopping
20-08-09 10:58:17 [debug]  Worker #4 is stopping
20-08-09 10:58:17 [debug]  Worker #5 is stopping
20-08-09 10:58:17 [debug]  Shutting down File Tracker service...
20-08-09 10:58:17 [debug]  Shutting down Upload Tracker service...
20-08-09 10:58:17 [debug]  Shutting down Token Manager service...
20-08-09 10:58:17 [debug]  All services has been shut down successfully
pacoorozco commented 4 years ago

That helps a lot @albertvaka. It seems that the bug could be related with some timeout... let me go deep before adding a local cache.

I was trying to avoid a local cache, it's always a nightmare with concurrency and HTTP clients.

albertvaka commented 4 years ago

In case you want to go with the local cache route, I've written a patch that adds one and I've checked it fixes this issue:

https://github.com/gphotosuploader/google-photos-api-client-go/pull/36

I think the cache is an improvement also for performance reasons (eg: it's very easy to hit the API rate limit with the current code), but feel free to discard my patch if you think otherwise.

grumpyjames commented 4 years ago

Apologies in advance, I'm not a go programmer, so my reading of the existing code may be a little hazy, or my proposed solution could be completely unworkable.

It looks like we create a job for each photo in an album, and then each of those jobs calls:

  album, err := job.PhotosService.GetOrCreateAlbumByName(job.AlbumName)

We could cache locally in the photos service if we know this. Alternatively, we could group uploads by album?

For example. At the moment, this file layout:

album-0/1.jpg
album-0/2.jpg
album-1/1.jpg
album-1/2.jpg
album-1/3.jpg

generates a queue that looks like this:

create album-0 if !exists, upload album-0/1.jpg
create album-0 if !exists, upload album-0/2.jpg
create album-1 if !exists, upload album-1/1.jpg
create album-1 if !exists, upload album-1/2.jpg
create album-1 if !exists, upload album-1/3.jpg

Which we now know can cause problems, because we think create album is eventually consistent.

Could we instead have a queue that looks like:

create album-0 if !exists
upload album-0/1.jpg
upload album-0/2.jpg
create album-1 if !exists
upload album-1/1.jpg
upload album-1/2.jpg
upload album-1/3.jpg

That avoids the need for a local cache, and calls create album at most once per CLI run. It would probably need us to have separate EnqueuedFileUpload and EnqueuedCreateAlbum structs, though.

I'll see if I can get something like this working locally, but given this is the first go I've ever written, I thought it might be worth describing it first :-)

EDIT: aha, yes, this doesn't work because the we need the album id, not the album name, so we have to find some way of the upload file tasks knowing the album id...

Continued EDIT:

and so a way to do this is probably to have the create album task spawn upload photo tasks with the correct album id in when it completes successfully. (and we can avoid the create album task altogether and go straight to upload if the configuration specifies it)

grumpyjames commented 4 years ago

I've had a go at this here:

https://github.com/grumpyjames/gphotos-uploader-cli/commit/ff18a38e408d4ed74f8a892af47ad3a9cd133ac0

If we like this approach...I can probably try and put some tests around it (and fix the ones I broke!) Alternatively, feel free to use it as inspiration in your own change Or if we hate it, please feel free to ignore it altogether :-)

pacoorozco commented 4 years ago

I'm trying to fix it using a different approach, caching album information (mainly Id) to avoid Google Photos inconsistency on album creation. And for free, it's reducing the number of calls to Google Photos.

@grumpyjames can you review it, please: #222

pacoorozco commented 4 years ago

The work is already done here. I'm going to release v2 of the package and then I will modify gphotos-uploader-cli to use it.

The v2 has a lot of improvements in performance and error handling. It's using a cache to reduce the number of calls to Google Photos and to fix this error too.

pacoorozco commented 4 years ago

This bug should be fixed on the last version, can you confirm it?

jendrikseipp commented 3 years ago

I'm using v2.0 and I'm getting lots of duplicate album titles. The fix above (https://github.com/gphotosuploader/gphotos-uploader-cli/issues/192#issuecomment-667660880) still works (with go 1.13), but obviously it would be nice not to have duplicate album names in the first place. (Apart from this, the app works really well, thanks!)

pacoorozco commented 3 years ago

Hi @jendrikseipp

Apologies for the behavior. Can you send me some logs of the CLI with debug mode enabled to understand better what is happening?

Thanks in advance

jendrikseipp commented 3 years ago

Thanks for looking into this! I sent you the debug log in an email.