mozilla / sccache

Sccache is a ccache-like tool. It is used as a compiler wrapper and avoids compilation when possible. Sccache has the capability to utilize caching in remote storage environments, including various cloud storage options, or alternatively, in local storage.
Apache License 2.0
5.85k stars 552 forks source link

.sccache_check is on the hot path and causes rate limiting errors #2070

Open alexandrnikitin opened 9 months ago

alexandrnikitin commented 9 months ago

Hey, I'm seeing a lot rate limiting errors at storage check (s3 backend). The ".sccache_check" file that is used for that check is on the hot path. What do you think if we make it configurable and expose it as an environment variable? Each actor can have it's own file that checks for read/write access. That would help to mitigate the issue. WDYT?

Example of the error:

storage write check failed: RateLimited (temporary) at Writer::write => S3Error { code: "SlowDown", message: "Please reduce your request rate.", resource: "", request_id: "T7HVSVY51KZ5E5ET" }

Context:
    response: Parts { status: 503, version: HTTP/1.1, headers: {"x-amz-request-id": "T7HVSVY51KZ5E5ET", "x-amz-id-2": "lx6IUMFEAgCQC32yIPFmwIV89vl9QnqkxzyyvYBg/VQTRtFC+21/dIrocKyworjoc/su/dQyyFA=", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Thu, 01 Feb 2024 00:32:06 GMT", "server": "AmazonS3", "connection": "close"} }
    service: s3
    path: .sccache_check

The code:

https://github.com/mozilla/sccache/blob/69be5321d2c2c125881b6edfed96676572b0ca03/src/cache/cache.rs#L481-L544

glandium commented 9 months ago

The check only happens when the server starts. How is that the hot path?

alexandrnikitin commented 9 months ago

I'm also surprised to see it from AWS. We have dozens of worker nodes and thousands of builds per day but it's not a crazy number. But I frequently see that error in the logs.

I see that others also reported the same or similar issues: https://github.com/mozilla/sccache/issues/1485 https://github.com/mozilla/sccache/issues/1485#issuecomment-1375160422 And PRs to mitigate it https://github.com/mozilla/sccache/pull/1557

orf commented 6 months ago

S3 has rate limits: many reads and writes to a single key can hit rate limits far before the underlying partition is rate limited. Even 20-30 PUTs on a single key within a very short period of time will exhaust it.

On versioned buckets this is lower, especially if there are many millions of versions may exist with this key.