rclone / rclone

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files
https://rclone.org
MIT License
46.06k stars 4.12k forks source link

Google Photos: max concurrent write request 429 RESOURCE_EXHAUSTED #6920

Closed sahilnarain closed 11 months ago

sahilnarain commented 1 year ago

Output of rclone version

rclone v1.61.1

Describe the issue

Google Photos backend hits the write quota, even though very small amounts of files are being written.

Google Photos backend hits write quota, both with the default and my own Google Photo Library client IDs. I have tried reducing the tpslimit to an unreasonably low amount (~=0.05 which means, 1 transaction every 20 seconds) but that gets rate limited very soon. I am only able to upload a small amount of photos before being rate limited.

rclone move $file gphotos:album/$album/ --transfers 1 --checkers 1 --no-check-dest --low-level-retries 1 --retries 2 --tpslimit 0.05 -vvP

Sample output:

...
2023-04-05 13:47:24 DEBUG : pacer: low level retry 1/1 (error Quota exceeded for quota 'concurrent write request' of service 'photoslibrary.googleapis.com'. (429 RESOURCE_EXHAUSTED))
2023-04-05 13:47:24 DEBUG : pacer: Rate limited, increasing sleep to 1.258731557s
2023-04-05 13:47:24 DEBUG : test.jpeg : >Update: err=failed to create media item: Quota exceeded for quota 'concurrent write request' of service 'photoslibrary.googleapis.com'. (429 RESOURCE_EXHAUSTED)
2023-04-05 13:47:24 DEBUG : Google Photos path "album/test-album": >Put:
2023-04-05 13:47:24 ERROR : test.jpeg : Failed to copy: failed to create media item: Quota exceeded for quota 'concurrent write request' of service 'photoslibrary.googleapis.com'. (429 RESOURCE_EXHAUSTED)
2023-04-05 13:47:24 ERROR : test.jpeg : Not deleting source as copy failed: failed to create media item: Quota exceeded for quota 'concurrent write request' of service 'photoslibrary.googleapis.com'. (429 RESOURCE_EXHAUSTED)
...

This could possibly mean that the write handle is not being closed properly, causing this to max out soon, after which I wait for a couple of hours before the rate limit gets lifted.

What's interesting is that no quotas are being hit on the API that I can see in my cloud console while using my own client ID. However, the 429 errors are for BatchCreateMediaItems method only.

Please reach out If I can provide any additional information, and if someone can guide me in the right direction, I can try taking a shot at fixing this too.

Thanks in advance!

ncw commented 1 year ago

Are you sure you are using your own client_id - do you see transactions against it? The rclone internal one is very overloaded. Note that you need to do rclone config reconnect if you add/change a client_id.

What's interesting is that no quotas are being hit on the API that I can see in my cloud console while using my own client ID. However, the 429 errors are for BatchCreateMediaItems method only.

Google changes their rate limiting code every week it seems so I don't have much hope we can fix this. There are a lot of undocumented rate limits too.

This could possibly mean that the write handle is not being closed properly, causing this to max out soon, after which I wait for a couple of hours before the rate limit gets lifted.

I'm not sure I understand what you mean here?

sahilnarain commented 1 year ago

Thank you @ncw for the response, I'll try to elaborate and explain this a bit better (hopefully!)

Are you sure you are using your own client_id - do you see transactions against it? The rclone internal one is very overloaded. Note that you need to do rclone config reconnect if you add/change a client_id.

Yes, absolutely sure. I wrote a script to pause and retry after pausing for a while in case there is an error. Usage and error graphs look something like this...

Screenshot 2023-04-05 at 10 45 54 PM Screenshot 2023-04-05 at 10 47 17 PM

I'm not sure I understand what you mean here?

I understand that Google changes their error codes often. I've gone through their docs too and there isn't a very clear explanation there either. I guess they mean to be opaque, intentionally. However, from what I can gather, it uploads media bytes first, and then writes this to a media item.

I suspect (and this is JUST a suspicion, purely based on observed behaviour and on the assumption that Google's error messages say what they mean) that somehow the write isn't closed or terminated properly, causing Google to think that a write to item from bytes is still ongoing, causing even a low rate of 1 list and 1 write in ~40 seconds to max out very soon. Can't think of any other reason why this might be happening.

Reasonably, a synchronous upload of 1 file every 40 seconds should not cause problems with concurrent write requests since I'd expect a file to be "written" as a media item within 40 seconds (ref: output in my first comment).

Thoughts? @ncw

gpl commented 1 year ago

I've also seen this on my side. I've left it running out of curiosity and it's slowly chewing through my libraries, but at an effectively glacial pace. I seem to be able to upload roughly 50-60GB/day, but I'm not sure if that's related.

Elapsed time: 4w1d13h17m12.8s

sahilnarain commented 1 year ago

@gpl Exactly! 50-60 GB a day is still not bad. I worked around it by using a bash script which randomly pauses between subsequent photo uploads.

for dir in `ls -d */`
do
  cd $dir
  echo
  for file in `ls -S .`
  do
    echo "##### `ls | wc -l` photos left, `du -m | cut -f1` MB #####"
    for i in `seq $[($RANDOM%180)+1] -1 1` ; do echo -ne "\r$i " ; sleep 1 ; echo -ne "\r"; done
    rclone move $file gphotos:album/$dir/ --transfers 1 --checkers 1 --no-check-dest --low-level-retries 1 --retries 1 --tpslimit 0.33 -vvP
    if [ ! $? -eq 0 ]
    then
      echo
      for i in `seq 900 -1 1` ; do echo -ne "\r$i " ; sleep 1 ; echo -ne "\r"; done
    fi
  done
  cd ..
done
z49x2vmq commented 1 year ago

Looks like current implementation of rclone's googlephoto is to upload single file and call batchCreate with single upload token.

But Google Photo API encourages user to call batchCreate with 50 upload tokens.

Include as many NewMediaItems as possible in each call to batchCreate to minimize the total number of calls you have to make. At most you can include 50 items.

ncw commented 1 year ago

@z49x2vmq dropbox has an upload batcher. Maybe this could be generalized for use by google photos too. It isn't straight forward batching things due to the concurrent nature of rclone.

z49x2vmq commented 1 year ago

Sorry I couldn't be much help here. I just started learning Go.

But here's what I did with python script to upload 28k photos.

Trial 1

pseudo code
loop:
   upload 50 images in series
   call batchCreate

This approach doesn't fully utilize my internet bandwidth(100mbps).

Trial 2

pseudo code
create a Thread Pool(size=6):
  Thread 1:
    upload images
    save upload token to shared list
    when shared list length = 50:
      call batchCreate with 50 tokens
  Thread 2:
    upload images
    save upload token to shared list
    when shared list length = 50:
      call batchCreate with 50 tokens
  .
  .
  .

This approach fully utilized network bandwidth.

Some ways I can imagine(without knowing capability of go or rclone):

If I have more idea after learning go, rclone, I will come back.

ncw commented 1 year ago

I've been experimenting with factoring the dropbox batcher out and I've managed to apply it to the google photos backend.

Please give this a go. I suggest trying it with --transfers 16 - you can try higher up to 50. You can also try --gphotos-batch-mode async which will be faster at the cost of not reporting the errors to the user. Again set transfers.

It seemed to work well in my tests but it isn't heavily tested.

v1.65.0-beta.7359.54e32fa26.fix-dropbox-batcher on branch fix-dropbox-batcher (uploaded in 15-30 mins)

Here is the help for the options

--gphotos-batch-mode

Upload file batching sync|async|off.

This sets the batch mode used by rclone.

This has 3 possible values

Rclone will close any outstanding batches when it exits which may make a delay on quit.

Properties:

--gphotos-batch-size

Max number of files in upload batch.

This sets the batch size of files to upload. It has to be less than 50.

By default this is 0 which means rclone which calculate the batch size depending on the setting of batch_mode.

Rclone will close any outstanding batches when it exits which may make a delay on quit.

Setting this is a great idea if you are uploading lots of small files as it will make them a lot quicker. You can use --transfers 32 to maximise throughput.

Properties:

--gphotos-batch-timeout

Max time to allow an idle upload batch before uploading.

If an upload batch is idle for more than this long then it will be uploaded.

The default for this is 0 which means rclone will choose a sensible default based on the batch_mode in use.

Properties:

--gphotos-batch-commit-timeout

Max time to wait for a batch to finish committing

Properties:

ncw commented 11 months ago

I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.65