romange / helio

A modern framework for backend development based on io_uring Linux interface
Apache License 2.0
435 stars 49 forks source link

feat(aws): add aws sdk support #146

Closed andydunstall closed 12 months ago

andydunstall commented 1 year ago

Adds support for using the AWS SDK.

Using the AWS SDK isn't ideal as it doesn't work that well with fibers and is slow and complex to build given it has lots of dependencies, though it seems like the simplest and fastest way to build a reliable S3 client for snapshots. Long term we may want to copy the subset of the SDK we use and re-implement, though that would still be a lot of work given theres a lot of complexity and edge cases...

Changes

Logger

Adds a custom AWS logger that uses glog, rather than the default AWS logger which writes to a file so requires separate configuration

This has some duplication, though I'm not sure how to avoid that. Since we're using glog, if the log is disabled the right side of the log won't run, including message_stream.str() which does a copy, so if we refactored that into a method or something we'd be copying the message without logging it...

Credentials

Adds a custom credential provider chain that supports:

Reading the environment won't block, and EC2 metadata uses the non-blocking HTTP client which is fine. Though loading the local profile configuration uses a blocking system call to read the file, however this happens very rarely (once on startup then whenever the credentials expire, if they ever expire) so less then a millisecond of thread blocking, no more than once every few hours seems ok for now. It will also only block when a snapshot is in progress which will already have overhead.

HTTP Client

We must extend the SDK with a non-thread-blocking HTTP client.

Thread Safe

The SDK creates a global HTTP client for EC2 metadata that may be used by any thread when fetching new credentials, therefore the HTTP client must be thread safe and usable by multiple proactors. This also means we only need a single S3 client that can be used by all threads which means we only have to load config and credentials once.

Since socket connections must be created and used by the same proactor, and we must never block the thread, the simplest option to make a thread safe HTTP client is to make the HTTP client stateless for now and create a new connection per request.

Since our S3 requests are large (8MB request/response), the overhead of connections is negligible so seems ok just to reconnect on each request. Benchmarking didn't see any difference with/without caching. Later we could do something like have HttpClient::MakeRequest get a set of cached connections for the current fiber from an array proactor_clients_[proactor.id] to avoid having to use a global thread blocking lock. (Connecting may have a bigger overhead with HTTPS? Though so far we only support HTTP so will leave)

Retries

When the AWS client retries, it backs off by putting the thread to sleep. Fortunately this sleep is done by the HTTP client in HttpClient::RetryRequestSleep, but unfortunately the method isn't virtual so can't be overridden by our custom HTTP client. Therefore this applies a simple patch in CMake to make the method virtual so we can block the fiber but not the thread.

Note the HttpClient has methods like EnableRequestProcessing which disable request processing, though they are only used for testing so not implementing in our client as it would be complex given it must be thread safe.

Redirects

The SDK handles redirects for us so we don't need it in the HTTP client.

Performance

Upload

When uploading, to map AWS HTTP requests to boost HTTP requests, this uses boost::interprocess::bufferstream so the Aws::IoStream uses a fixed underlying buffer, then the HTTP client just writes this underlying buffer directly to the socket and avoids any copies. This is a bit of a hack but significantly improves performance.

The remaining performance issues are:

The old client doesn't do either of these (the signature only includes the headers)

Benchmarking a 5GB upload (using s3v2_demo):

So without the MD5 header and signing the payload (which is the same behaviour as the old client), the performance is the same as the old client. Though adding MD5 and payload signature adds latency and resource usage.

Having both the MD5 and payload signature seems redundant. Therefore we can remove the MD5 header and just keep the payload signature. Removing the MD5 will require patching the SDK to allow disabling the checksum in the UploadPartRequest. (Though the MD5 is a SOC2 requirement, so it depends if a payload SHA-256 signature is enough)

Note the SDK only includes the payload in the signature when using HTTP rather than HTTPS, so if we add HTTPS support we may have to add back the MD5 header

Note if high resource usage becomes an issue and is too disruptive to normal traffic, we can add client side rate limiting to slow down.

Downloads

The download is a bit slower than the old client, though its not as much of an issue given it only happens once on boot. Once integrated with Dragonfly can benchmark and look at improving.

Timeouts

The HTTP client is configured with connection and request timeouts which we should support, will add in another PR.

HTTPS

Not including HTTPS support in this PR. Can include later if required.

Custom Endpoint

We need to be able to configure custom S3 endpoints to test locally (such as MinIO).

According to https://docs.aws.amazon.com/sdkref/latest/guide/feature-ss-endpoints.html, custom endpoints are not supported by the C++ SDK, therefore this uses a custom S3 endpoint provider to support custom endpoints ourselves.

S3

The SDK has a 'transfer manager', though that is built for uploading local files rather than our 'writer' and 'reader' file interface, plus it also has some complex 'executor' event loop which we don't need.

Therefore we add our own S3ReadFile and S3WriteFile implementations.

The S3ReadFile uses GetObject which accepts a byte range to fetch, so we can fetch large objects in 10MB chunks.

The S3WriteFile uses multipart uploads with CreateMultipartUpload, UploadPart, and CompleteMultipartUpload, where we upload 10MB parts and S3 aggregates the result.

Both of these match how the SDK handles large downloads and uploads (mostly reading the Go SDK)

Testing and Performance

Added some upload/download benchmarks above.

Will add more benchmarks and testing on the Dragonfly integration PR (see https://github.com/dragonflydb/dragonfly/pull/1929). Also assuming to leave automated regression tests to Dragonfly rather than Helio...

codecov-commenter commented 1 year ago

Codecov Report

All modified lines are covered by tests :white_check_mark:

Comparison is base (4c261c1) 77.57% compared to head (f769462) 77.48%. Report is 3 commits behind head on master.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #146 +/- ## ========================================== - Coverage 77.57% 77.48% -0.10% ========================================== Files 98 98 Lines 7136 7165 +29 ========================================== + Hits 5536 5552 +16 - Misses 1600 1613 +13 ``` [see 12 files with indirect coverage changes](https://app.codecov.io/gh/romange/helio/pull/146/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

romange commented 1 year ago

MD5 is a SOC2 requirement - how do you know this?

andydunstall commented 1 year ago

MD5 is a SOC2 requirement - how do you know this?

@ashotland told me about it before

Customers should utilize Amazon S3’s option to specify an MD5 checksum as part of a REST PUT operation for the data being sent to Amazon S3. When the request arrives at Amazon S3, an MD5 checksum will be recalculated for the object data received and compared to the provided MD5 checksum. If there is a mismatch, the PUT will be failed, preventing data that was corrupted on the wire from being written into Amazon S3.

royjacobson commented 1 year ago

MD5 is a SOC2 requirement - how do you know this?

@ashotland told me about it before

Customers should utilize Amazon S3’s option to specify an MD5 checksum as part of a REST PUT operation for the data being sent to Amazon S3. When the request arrives at Amazon S3, an MD5 checksum will be recalculated for the object data received and compared to the provided MD5 checksum. If there is a mismatch, the PUT will be failed, preventing data that was corrupted on the wire from being written into Amazon S3.

If it's only for non-malicious data integrity (and using MD5 is really insecure for anything else) this page says we can use CRC32 instead

andydunstall commented 1 year ago

If it's only for non-malicious data integrity (and using MD5 is really insecure for anything else) this page says we can use CRC32 instead

We have the payload signature for malicious data integrity so I guess its ok though not sure how SOC2 works and if thats acceptable (cc @ashotland?) (given we have the sha-256 signature the checksum seems redundant anyway?)

andydunstall commented 1 year ago

I've provisionally updated to CRC32 which has little CPU overhead

A 5GB upload with CRC32 + a SHA-256 payload signature is the same as a SHA-256 payload signature without any checksum (~25% CPU, 30s)

ashotland commented 1 year ago

If it's only for non-malicious data integrity (and using MD5 is really insecure for anything else) this page says we can use CRC32 instead

We have the payload signature for malicious data integrity so I guess its ok though not sure how SOC2 works and if thats acceptable (cc @ashotland?) (given we have the sha-256 signature the checksum seems redundant anyway?)

Yep, as long as we verify integrity of the data we should be good.

andydunstall commented 1 year ago

Agreed with Roman will add an option to disable the payload SHA-256 (so the signature only includes the headers)

Then on the control plane when its in a VPC we can disable, though users can optionally enable when using in the community version