man-group / ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
http://arcticdb.io
Other
1.52k stars 93 forks source link

Add AWS STS authentication support #1884

Open phoebusm opened 1 month ago

phoebusm commented 1 month ago

Copy of https://github.com/man-group/ArcticDB/pull/1814 so @poodlewars can review

Reference Issues/PRs

https://github.com/man-group/ArcticDB/issues/1883

What does this implement or fix?

Add support of AWS STS authentication

Any other comments?

STS is an AWS iam service which allows user account to gain access of resources by assuming role. First, the SDK will send assume role request to iam. If the authentication is successful, iam will reply with a temporary access key. With that access key, SDK will now have access to the resources. The above logic and the refresh of temporary access key are handled by the SDK. P.S. the validity check of the temporary access key is made at the beginning of each IO only,

Maintainers of S3 C++ SDK has refused to align this authentication method with other SDKs (e.g. boto3), which create 2 problems:

  1. Role to be assumed is needed to specifed in the API but the role_arn, access key and id are needed to be specifed in the shared config file
  2. The method is not added to the default credential provider chain. Although the SDK allows us to supply a customized chain with the STS authentication method, it has led to two drawbacks, which force me to switch to a dedicated option to switch on STS authentication:
    • SDK will not auto refresh temporary credential. ArcticDB would be required to add a tedious key expiry detection and refresh logic
    • Extra maintainece is required as the chain will need stay align with the default one to align the user experience with previous user experience

Detailed setup of the STS method can be referred to the content of this PR. The test added in this PR requires real S3. Temporary user, policy and role are created for the test in the pipeline.

The final caveat is seems the config file is loaded at the import of ArcticDB and not being refreshed at the entire lifecycle of the python interpreter. This has required a hack in the testing framework to always create the config file at the beginning of any tests. As it's real S3 test specific, it won't create any pain for day-to-day local development and general PR tests.

Checklist

Checklist for code changes... - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?