Closed hessamkhoshniat closed 1 month ago
Hello Hessam,
I am trying to run a similar setup with AWS Batch on EC2 (not Fargate). If you use the AWS Batch executor for snakemake, you will notice that the the SNAKEMAKE_STORAGE_xxx credentials are going to be passed in the 'command' option and will be logged in cloudtrail and also visible in the AWS console. This is considered bad security practice.
That is the reason why I went a different road and decided to not forward the credentials used by the snakemake 'orchestrator' process and used the 'job role' feature in AWS Batch (the job role should have the runtime permissions to run your tasks + the permissions to read from/to your input/output buckets). This can be easily be achieved as most AWS runtimes (including boto3) will load credentials from their instance metadata if none are provided.
Unfortunately the S3 plugin does not allow this but this is a trivial patch. I have submitted a PR which does exactly this: https://github.com/snakemake/snakemake-storage-plugin-s3/pull/31. You might want to give it a try. Would be great to have your feedback on it.
Please take into account that the PR is my first attempt to contribute to the snakemake ecosystem so it might not be merged as-in in the repo.
Hello Jlafaye, Thanks a lot for your comment and also your PR. We'll give it a try and will inform you on the outcome.
FYI said PR was closed but another mostly similar PR was merged instead: https://github.com/snakemake/snakemake-storage-plugin-s3/pull/33
I this resolved now?
No response so far. I assume this is resolved. Please reopen if the problem persists.
We have a Dockerized Snakemake pipeline with the input data stored on a S3 bucket snakemake-bucket:
Snakefile:
Dockerfile
When we run the container with the following command, it downloads the input file, runs the pipeline and stores the output on the bucket successfully:
docker run -it -e SNAKEMAKE_STORAGE_S3_ACCESS_KEY=**** -e SNAKEMAKE_STORAGE_S3_SECRET_KEY=**** our-snakemake:v0.0.10
However, when we deploy it as an AWS Batch Job or AWS Fargate Task, it gives the following error immediately:The image works fine on local and also on an external VPS but it doesn't work on an AWS Fargate. The file on the bucket is accessible and downloadable from inside the container on the AWS task, checked by:
/opt/conda/envs/snakemake/bin/python -c "import os ;import boto3 ;s3 = boto3.resource('s3',aws_access_key_id=os.environ.get('SNAKEMAKE_STORAGE_S3_ACCESS_KEY'), aws_secret_access_key=os.environ.get('SNAKEMAKE_STORAGE_S3_SECRET_KEY')) ;my_bucket = s3.Bucket('snakemake-bucket') ; [ my_bucket.download_file(d.key,d.key) for d in my_bucket.objects.all()];print(os.listdir())"
Snakemake Docker tag: snakemake/snakemake:v8.15.2It seems in AWS Fargate sets some environment variables including AWS_CONTAINER_CREDENTIALS_RELATIVE_URI on which boto3 decide that it needs AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID in addition to SNAKEMAKE_STORAGE_S3_SECRET_KEY and SNAKEMAKE_STORAGE_S3_ACCESS_KEY. if we want to run Snakemake in AWS Fargate we have to set all 4 variables, or we have to unset AWS_CONTAINER_CREDENTIALS_RELATIVE_URI. It's a good Idea to add the comment to the documentation that if they are using AWS, they also need to to set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY : https://snakemake.readthedocs.io/en/stable/snakefiles/storage.html#credentials