tus / tusd

Reference server implementation in Go of tus: the open protocol for resumable file uploads
https://tus.github.io/tusd
MIT License
3.06k stars 477 forks source link

GCS Rate limit exceeded. #343

Open PokeGuys opened 4 years ago

PokeGuys commented 4 years ago

Describe the bug Updating .info file too frequently (1 request per second) triggered GCS object rate limit.

To Reproduce Steps to reproduce the behavior:

  1. Start upload file faster than 1 request per second.
  2. See error
    googleapi: Error 429: The rate of change requests to the object storage/8f3ffb04745ec523b4689ce083ee868d.info exceeds the rate limit. Please reduce the rate of create, update, and delete requests.

Expected behavior Adding an exponential backoff / taking advantage of using object metadata to avoid rate limit error on updating .info file.

Acconut commented 4 years ago

Thanks for reporting. If I understand correctly, this excerpt for the GCS documentation explains this behavior:

There is no limit to how frequently you can create or update different objects in a bucket. However, a single particular object can only be updated or overwritten up to once per second. For example, if you have an object bar in bucket foo, then you should only upload a new copy of foo/bar about once per second. Updating the same object more frequently than once per second may result in 429 Too Many Requests errors. You should retry failed requests using truncated exponential backoff.

We would be happy to accept a PR for retrys and/or alternative solutions but we don't have the resources to work on this on our own.

Alternatively, you might want to reduce the number of requests you send for one upload. Sending more than 1 req/s for a single upload sounds wasteful to me. Is there a reason why you are doing this?

PokeGuys commented 4 years ago

The problem is chunkSize cannot be controlled by server-side configuration. If the chunkSize is too small or the user's network speed is too fast, it will trigger the GCS rate limit by keep changing the Offset in .info file.

Acconut commented 4 years ago

Do you control the clients and can increase their chunkSize? In most cases it is not even necessary to set the chunkSize parameter anyways.

And, as I said, we are open for PRs!

ghost commented 4 years ago

Hi!

@Acconut - how do you feel about adding generic debounce and throttle implementations to the code that can be applied to any store? I can try to take a crack at a PR soon.

If we make these configurable, they could be applied to any potential cloud provider. I'm already imagining Spaces support for DigitalOcean and may consider this scenario as a use case. From the documentation, I'll have to support these limits:

"Spaces have the following request rate limits: 750 requests (any operation) per IP address per second to all Spaces on an account. 150 PUT s, 150 DELETE s, 150 LIST s, and 240 other requests per second to any individual Space. 2 COPY s per 5 minutes on any individual object in a Space.Jun 19, 2018"

If the implementation is placed at the correct layer and made suitably generic, it should be a "2 birds with 1 stone" sort of fix. Do you agree?

Kind regards, rbastic

Acconut commented 4 years ago

@rbastic A PR would be amazing but we should probably discuss the details first! We have never hit limits with AWS S3 and I believe to remember reading that the AWS SDK automatically handles retries for S3. But I am not entirely sure if we set that up properly. Maybe a similar setting also exists for the GCS library and we can safe ourselves some time and simply use this.

Furthermore, we need to avoid decreasing performance when the rate limits are not hit. Without knowing details about your proposal of "generic debounce and throttle implementations", I currently prefer a reactive approach to handling rate limits where we simply retry requests after some delay when a rate limit is reached. This sounds more stable and simpler in my opinion than being proactive and throttling outgoing requests inside tusd.

What do you think?

ghost commented 4 years ago

Discussing details first is always wise :-)

IIRC, the limits for AWS S3 are 3,500 requests per second for writes, and 5,500 requests for reads. That may be why neither of us are hitting them?

Retrying once limits have already been exhausted strikes me as unwise, due to cloud vendors possibly interpreting that as the program misbehaving. I am not sure though and would need to reach out to vendors to gather more details and confirm. Do you have experience or evidence that confirms this approach is OK to do?

Acconut commented 4 years ago

IIRC, the limits for AWS S3 are 3,500 requests per second for writes, and 5,500 requests for reads. That may be why neither of us are hitting them?

That could also be the case, yes :)

cloud vendors possibly interpreting that as the program misbehaving.

Frankly, I am not concerned about that. Cloud vendors are likely used to programs reaching rate limits and as long as one does not floods them with request, I don't think negative consequences will follow.

Do you have experience or evidence that confirms this approach is OK to do?

Yes, we use this approach in some private projects for communicating with third-parties without any issues. In fact, tus-js-client also uses this mechanism when talking to tus servers.

If the GCS rate limit problem is not controllable by retries, we can talk about internal request throttling. But since that solution requires more work, I would like to give the retries a shot first.

ghost commented 4 years ago

Sensible. Do you have any preferences for a package or approach to use? I'm assuming we are referring to 'exponential retry' in this context. Any advice on what files/lines may be a good start and would be much appreciated.

Acconut commented 4 years ago

I think we should first investigate whether we can use the retry logic in the AWS / GCS SDK before implementing our own retry logic. AWS already seems to have something in place: https://docs.aws.amazon.com/sdk-for-go/api/aws/client/#pkg-constants GCS as well: https://github.com/googleapis/google-cloud-go/blob/e4304e858fa0304fcf2dfd761ad4680cb0500ff1/storage/invoke.go#L26 So am unsure if we even need to change something in tusd.

ghost commented 4 years ago

Thanks for following up more - good find! I wasn't aware of this at all.

I'll review this further and get back to you soon.