webrecorder / browsertrix-crawler

Run a high-fidelity browser-based crawler in a single Docker container
https://crawler.docs.browsertrix.com
GNU Affero General Public License v3.0
607 stars 79 forks source link

Can an AWS alternative to Access Keys be added? #644

Open jblukach opened 1 month ago

jblukach commented 1 month ago

I assembled a Python stack for Cloud Development Kit (CDK) that runs the Browsertrix Crawler docker container as an ECS Fargate task.

I try to avoid users at all costs by using Amazon roles. Instead, could the container be configured to use the task policy first, but if not available, look for the access key?

https://github.com/jblukach/lunkerzero/blob/main/lunkerzero/lunkerzero_inspection.py#L167

It could potentially help with issue #448, which is to have Browsertrix Crawler run as a docker Lambda container.

ikreymer commented 1 month ago

Hi @jblukach - first, thank you for your support, we really appreciate it!

Do you have an example / more info of what would be needed for this? This would be in place of using the access key env vars for storage? What is needed to make use of the task policy? We probably also would need to finish #547 since it probably not supported with minio client library. We haven't been using AWS / slightly hesitant to focus on a specific environment, but it is AWS, and maybe there's a way to do it without much impact...

jblukach commented 1 month ago

I appreciate all the effort into Webrecorder, which is a big project!

I was wondering if it would be possible to use temporary credentials instead of the static access keys.

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html#using-temp-creds-sdk-cli

The container inherits the permissions from the task policy that generates the temporary credentials at runtime that should be accessible from these environment variables.

AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN

I saw some past issues for minio around this that are a bit unclear of the outcome, but it does appear support was added, maybe?

https://min.io/docs/minio/linux/developers/python/API.html

It looks like multipart uploads got added, too; just CreateSession is still missing for support.

https://min.io/docs/minio/linux/reference/s3-api-compatibility.html

Is it possible to test for STORE_ACCESS KEY first, as it is nice not to be tied to a specific environment, and if not available, try the AWS temporary credentials?