Closed nicholak closed 4 years ago
BTW, this application is awesome! How can I support it?
I can also confirm duplicated albums ;(
Hi @nicholak
Let me check the issue related with albums duplication... I have an idea about what is happenning but I'd like to test it before. Hopefully I'll be able later.
In regards to your kind offer to support the app...
Thanks
Just hit the same issue, is there a patch I can test?
By the way, thanks a lot for writing this great tool in the first place :D
I have written a super ugly hack to fix this. The code is crap and it's not meant to be merged back upstream, but it did the job of fixing the albums for me.
https://github.com/albertvaka/gphotos-uploader-cli
Some notes:
gphotos-uploader-cli dedupe
to run it.Hi @albertvaka
Good job implementing this new command to deduplicate albums with the same name. I think we can use it in order to fix the already created albums, but this is not fixing the bug, right?
I mean, there's a race condition that makes gphotos-uploader-cli
to create an album that already exist. In order to use your code, I think we should address the bug too.
Hi @pacoorozco,
This isn't supposed to fix the bug, but it fixes the affected (duplicated) albums once the bug has already occurred. My code is just a big hack and it's not meant to replace the actual fix. I shared it in case it was useful to more people, but I don't expect it to be merged into the project :)
Albert
Hi I'm trying to reproduce this issue and I'm not able to do it.
I have used the script with 100 of different files against the same album name and it has worked perfectly. Do you have a way to reproduce it?
I have also found that this happens.
I was uploading a batch of maybe 2,000 photos in 52 directories. During the upload, I encountered a couple of API limit exceeded errors, but only for a couple of photos. More than two albums had duplicate entries though, and they weren't restricted to those where a photo had breached an API limit.
You mentioned this might be a race: I don't know if this effects how you might be thinking about the problem, but in case it does, you should know that my upstream bandwidth is not huge - it probably takes 3-4 seconds for each photo to perform the upload.
Hi @grumpyjames
Thanks for adding you experience, I'm trying to understand where the problem is. Would you answer two questions?
@grumpyjames, Can you share with me the name of the albums?
Yep :-). These were the names:
20190720-moare
20191030-nevern
20191122-antinho-and-old-goa
It's really strange.. I've reviewed the code several times, I'm not able to see where the album duplication is possible... and I can't reproduce it on my environment... :cry:
I can reproduce it 100% with these 4 files: https://www.dropbox.com/s/7yp8xc9wt9m92xj/test.tar.gz?dl=0
Consistently, the job for the second file (a video, which is a bit larger than the rest of files... not sure if that's relevant though) creates a duplicate album (with a different ID than the other 3 files).
My config is:
jobs: [
{
account: "albertvaka@gmail.com"
sourceFolder: "/path/to/test/directory"
makeAlbums: {
enabled: true
use: folderName
}
deleteAfterUpload: false
includePatterns: [ "_ALL_VIDEO_FILES_", "*.jpg", "*.GIF", "*.MOV", "*.mov", "*.gif", "*.png", "*.jpeg", "*.JPEG", "*.JPG", "*.Jpg", "*.Jpeg", "*.PNG", "*.mpeg", "*.AVI", "*.MPG", "*.mpg", "*.3GP", "*.3gp", "*.mp4", "*.MP4", "*.webm", "*.WEBM" ]
excludePatterns: [ ]
},
]
Thanks @albertvaka for the test case...
Unfortunately, It has worked without any issue. I mean 4 files were created inside one album :roll_eyes:
There is something different between our setups...
gphotos-uploader-cli version
?--debug
flag and paste the output?Thanks.
Maybe Albums.Create
is eventually consistent? I found that after calling it, the next loop through all the albums done in AlbumByName
doesn't contain the album that was just created (so, we create a new one). I have ~650 albums, so maybe that makes the problem happen more often for me?
If that's the case, a solution would be to cache the albums we create, and get them from the internal cache instead of fetching them from the server. This would also greatly speed up the program... since going through 14 pages of albums each time to find the album we just created is rather slow.
I'm using latest master, this is an output with --debug
(timestamps added by me so you can see where it takes longer, plus GetOrCreateAlbumByName
lines also added by me so you can see how a different ID is returned):
20-08-09 10:57:36 [debug] Worker #2 is starting
20-08-09 10:57:36 [debug] Worker #4 is starting
20-08-09 10:57:36 [debug] Worker #1 is starting
20-08-09 10:57:36 [debug] Worker #3 is starting
20-08-09 10:57:36 [debug] Worker #5 is starting
20-08-09 10:57:37 [debug] Getting OAuth token for albertvaka@gmail.com
20-08-09 10:57:37 [debug] Token expiration: 2020-08-09 11:08:27.880597999 +0200 CEST
20-08-09 10:57:37 [info] Found item to upload: /tmp/test/test9/IMG_20140227_215903.jpg
20-08-09 10:57:37 [info] Found item to upload: /tmp/test/test9/RelampagoAkademy2013.webm
20-08-09 10:57:37 [info] Found item to upload: /tmp/test/test9/Screenshot 2020-07-11 14.07.39.png
20-08-09 10:57:37 [info] Found item to upload: /tmp/test/test9/Screenshot 2020-07-30 22.28.22.png
20-08-09 10:57:37 [info] 4 files pending to be uploaded in folder '/tmp/test'.
20-08-09 10:57:37 [debug] Worker #2 processing: /tmp/test/test9/Screenshot 2020-07-30 22.28.22.png
20-08-09 10:57:37 [debug] Worker #4 processing: /tmp/test/test9/IMG_20140227_215903.jpg
20-08-09 10:57:37 [debug] Worker #3 processing: /tmp/test/test9/Screenshot 2020-07-11 14.07.39.png
20-08-09 10:57:37 [debug] Worker #1 processing: /tmp/test/test9/RelampagoAkademy2013.webm
20-08-09 10:57:55 [debug] GetOrCreateAlbumByName returns id = ADV8uHTVHpgw9h0ergzQtq1NlfrFPdook4piRoq4mDyUcXrajzasZ21lXd7C9DoFAFcSUJS_ICQb
20-08-09 10:57:57 [debug] Successfully processing /tmp/test/test9/Screenshot 2020-07-30 22.28.22.png
20-08-09 10:58:12 [debug] GetOrCreateAlbumByName returns id = ADV8uHSoCSDwakQ0rUSqi9iAeIdLUYhdk9813kVjsi9fgKL0TXJ08HBU5hEXmFYDUjAJXxL1dIL4
20-08-09 10:58:13 [debug] GetOrCreateAlbumByName returns id = ADV8uHTVHpgw9h0ergzQtq1NlfrFPdook4piRoq4mDyUcXrajzasZ21lXd7C9DoFAFcSUJS_ICQb
20-08-09 10:58:14 [debug] Successfully processing /tmp/test/test9/IMG_20140227_215903.jpg
20-08-09 10:58:15 [debug] GetOrCreateAlbumByName returns id = ADV8uHTVHpgw9h0ergzQtq1NlfrFPdook4piRoq4mDyUcXrajzasZ21lXd7C9DoFAFcSUJS_ICQb
20-08-09 10:58:15 [debug] Successfully processing /tmp/test/test9/Screenshot 2020-07-11 14.07.39.png
20-08-09 10:58:17 [debug] Successfully processing /tmp/test/test9/RelampagoAkademy2013.webm
20-08-09 10:58:17 [info] 4 processed files: 4 successfully, 0 with errors
20-08-09 10:58:17 [debug] Worker #1 is stopping
20-08-09 10:58:17 [debug] Worker #2 is stopping
20-08-09 10:58:17 [debug] Worker #3 is stopping
20-08-09 10:58:17 [debug] Worker #4 is stopping
20-08-09 10:58:17 [debug] Worker #5 is stopping
20-08-09 10:58:17 [debug] Shutting down File Tracker service...
20-08-09 10:58:17 [debug] Shutting down Upload Tracker service...
20-08-09 10:58:17 [debug] Shutting down Token Manager service...
20-08-09 10:58:17 [debug] All services has been shut down successfully
That helps a lot @albertvaka. It seems that the bug could be related with some timeout... let me go deep before adding a local cache.
I was trying to avoid a local cache, it's always a nightmare with concurrency and HTTP clients.
In case you want to go with the local cache route, I've written a patch that adds one and I've checked it fixes this issue:
https://github.com/gphotosuploader/google-photos-api-client-go/pull/36
I think the cache is an improvement also for performance reasons (eg: it's very easy to hit the API rate limit with the current code), but feel free to discard my patch if you think otherwise.
Apologies in advance, I'm not a go
programmer, so my reading of the existing code may be a little hazy, or my proposed solution could be completely unworkable.
It looks like we create a job for each photo in an album, and then each of those jobs calls:
album, err := job.PhotosService.GetOrCreateAlbumByName(job.AlbumName)
We could cache locally in the photos service if we know this. Alternatively, we could group uploads by album?
For example. At the moment, this file layout:
album-0/1.jpg
album-0/2.jpg
album-1/1.jpg
album-1/2.jpg
album-1/3.jpg
generates a queue that looks like this:
create album-0 if !exists, upload album-0/1.jpg
create album-0 if !exists, upload album-0/2.jpg
create album-1 if !exists, upload album-1/1.jpg
create album-1 if !exists, upload album-1/2.jpg
create album-1 if !exists, upload album-1/3.jpg
Which we now know can cause problems, because we think create album is eventually consistent.
Could we instead have a queue that looks like:
create album-0 if !exists
upload album-0/1.jpg
upload album-0/2.jpg
create album-1 if !exists
upload album-1/1.jpg
upload album-1/2.jpg
upload album-1/3.jpg
That avoids the need for a local cache, and calls create album at most once per CLI run. It would probably need us to have separate EnqueuedFileUpload
and EnqueuedCreateAlbum
structs, though.
I'll see if I can get something like this working locally, but given this is the first go
I've ever written, I thought it might be worth describing it first :-)
EDIT: aha, yes, this doesn't work because the we need the album id, not the album name, so we have to find some way of the upload file tasks knowing the album id...
Continued EDIT:
and so a way to do this is probably to have the create album task spawn upload photo tasks with the correct album id in when it completes successfully. (and we can avoid the create album task altogether and go straight to upload if the configuration specifies it)
I've had a go at this here:
https://github.com/grumpyjames/gphotos-uploader-cli/commit/ff18a38e408d4ed74f8a892af47ad3a9cd133ac0
If we like this approach...I can probably try and put some tests around it (and fix the ones I broke!) Alternatively, feel free to use it as inspiration in your own change Or if we hate it, please feel free to ignore it altogether :-)
I'm trying to fix it using a different approach, caching album information (mainly Id) to avoid Google Photos inconsistency on album creation. And for free, it's reducing the number of calls to Google Photos.
@grumpyjames can you review it, please: #222
The work is already done here. I'm going to release v2
of the package and then I will modify gphotos-uploader-cli
to use it.
The v2
has a lot of improvements in performance and error handling. It's using a cache to reduce the number of calls to Google Photos and to fix this error too.
This bug should be fixed on the last version, can you confirm it?
I'm using v2.0 and I'm getting lots of duplicate album titles. The fix above (https://github.com/gphotosuploader/gphotos-uploader-cli/issues/192#issuecomment-667660880) still works (with go 1.13), but obviously it would be nice not to have duplicate album names in the first place. (Apart from this, the app works really well, thanks!)
Hi @jendrikseipp
Apologies for the behavior. Can you send me some logs of the CLI with debug mode enabled to understand better what is happening?
Thanks in advance
Thanks for looking into this! I sent you the debug log in an email.
Describe the bug Subsequent uploads appear to create duplicate folder name albums where not all files from a folder upload in 1 batch.
To Reproduce My steps were 1) Configure config file: makeAlbums: enabled: true & use: folderName deleteAfterUpload: false 2) Perform 1st upload. All ok 3) After exceeding API limit, perform upload again after 20 hours 4) Albums of same name created with remaining files from each folder.
Expected behavior A single album created for each folder.
Screenshots
Desktop & Client versions (please complete the following information):
Additional context I have a 100GB account and am nowhere near the limit