nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.68k stars 621 forks source link

AWS profile not picked up when using Nextflow Fusion #4868

Open rcannood opened 5 months ago

rcannood commented 5 months ago

Bug report

When I run a Nextflow workflow with Nextflow Fusion (input is on S3, output is on S3, work dir is on S3), Nextflow doesn't seem to pick up my AWS_PROFILE environment variable when resolving the work directory.

script.sh:

#!/bin/bash

# set AWS profile
export AWS_PROFILE=di

# create config
cat > /tmp/nextflow.config <<EOF
fusion {
    enabled = true
    exportStorageCredentials = true
}

wave {
    enabled = true
}
EOF

# run component
DATA=s3://data-intuitive-tmp/test-nextflow-wave-fusion/resources

NXF_VER=23.10.1 nextflow run \
  viash-io/test-nextflow-wave-fusion \
  -r main_build \
  -main-script target/nextflow/method/main.nf \
  -profile docker \
  -latest \
  -w "s3://data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue" \
  -c /tmp/nextflow.config \
  --input $DATA/input1.txt \
  --multiple_input "$DATA/input1.txt;$DATA/input2.txt" \
  --publish_dir "s3://data-intuitive-tmp/test-nextflow-wave-fusion/output/github_issue"
Script output N E X T F L O W ~ version 23.10.1 Pulling viash-io/test-nextflow-wave-fusion ... Already-up-to-date Launching `https://github.com/viash-io/test-nextflow-wave-fusion` [elegant_linnaeus] DSL2 - revision: 33e8c0898e [main_build] executor > local (fusion enabled) (1) [0c/11dfda] process > method:processWf:method_process (run) [ 0%] 0 of 1 [- ] process > method:publishStatesSimpleWf:publishStatesProc - ERROR ~ Error executing process > 'method:processWf:method_process (run)' Caused by: Process `method:processWf:method_process (run)` terminated with an error exit status (127) Command executed: ... Command exit status: 127 Command output: (empty) Command error: 10:42AM ERR reading from data store error="unknown schema 's3' data store for path '/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339'" path=/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339 10:42AM WRN timeout waiting for FUSE mount 10:42AM INF shutdown filesystem start 10:42AM INF shutdown filesystem done 10:42AM WRN using anonymous credentials to connect S3 region=us-east-1 10:42AM INF Not mounting S3 data store - operation error S3: PutObject, https response error StatusCode: 403, RequestID: HZSVEPCTVBNBXTW5, HostID: IIc0G2vr2I4+90ewrfvO2wEB4kMGyyJZbPkqJRLmXEkFf3l0RU4q/okdmE2AUiT4bNm5XfYnMlg=, api error AccessDenied: Access Denied 10:42AM ERR no datastore error="unknown schema 's3' data store for path '/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339'" 10:42AM ERR reading from data f0c2c7008339'" 10:42AM ERR reading from data store error="unknown schema 's3' data store for path '/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339'" path=/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339 10:43AM WRN timeout waiting for FUSE mount 10:43AM INF shutdown filesystem start 10:43AM INF shutdown filesystem done 10:43AM WRN using anonymous credentials to connect S3 region=us-east-1 10:43AM INF Not mounting S3 data store - operation error S3: PutObject, https response error StatusCode: 403, RequestID: YGMB6Z2H9YG2N99K, HostID: WmBTHjaYp6BmjhCH7Fogk997vEUMZWwq1iCCdv02AZrRE0NzgeWM1j9vAZZQ5QNlCRpXBXXg10cVSJ5kd3fNRQ==, api error AccessDenied: Access Denied 10:43AM ERR no datastore error="unknown schema 's3' data store for path '/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339'" 10:43AM ERR reading from data store error="unknown schema 's3' data store for path '/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339'" path=/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339 10:43AM WRN timeout waiting for FUSE mount 10:43AM INF shutdown filesystem start 10:43AM INF shutdown filesystem done 10:43AM ERR creating .command.log file error="open /fusion/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339/.command.log: no such file or directory" bash: /fusion/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339/.command.run: No such file or directory 10:43AM INF shutdown filesystem start 10:43AM ERR on FUSE sigterm send error="os: process already finished" 10:43AM INF shutdown filesystem done 10:43AM WRN using anonymous credentials to connect S3 region=us-east-1 10:43AM INF Not mounting S3 data store - operation error S3: PutObject, https response error StatusCode: 403, RequestID: XPJJAAJT1GP6DQHV, HostID: Quu+aMr9aQy5c9iGaE/cE14EOQAq5Cij4Tf7Wv3cidqykFf9i9SxRN0xRaQ9JSlpD9tXdMmgKNQ=, api error AccessDenied: Access Denied 10:43AM ERR no datastore error="unknown schema 's3' data store for path '/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339'" 10:43AM ERR reading from data store error="unknown schema 's3' data store for path '/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339'" path=/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339 10:44AM WRN timeout waiting for FUSE mount 10:44AM INF shutdown filesystem start 10:44AM INF shutdown filesystem done 10:44AM WRN using anonymous credentials to connect S3 region=us-east-1 10:44AM INF Not mounting S3 data store - operation error S3: PutObject, https response error StatusCode: 403, RequestID: 3YS0EF7DCBZHM33Z, HostID: EJeDJUhQsIXrrzz4IK1V85LRITxhopiWyT3BEu63xBI5tqVS92HwOU8Vhiuwm0a625+7b2ILrUE=, api error AccessDenied: Access Denied 10:44AM ERR no datastore error="unknown schema 's3' data store for path '/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339'" 10:44AM ERR reading from data store error="unknown schema 's3' data store for path '/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339'" path=/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339 10:44AM WRN timeout waiting for FUSE mount 10:44AM INF shutdown filesystem start 10:44AM INF shutdown filesystem done 10:44AM WRN using anonymous credentials to connect S3 region=us-east-1 10:44AM INF Not mounting S3 data store - operation error S3: PutObject, https response error StatusCode: 403, RequestID: 5WWA767GJWTWZ1MG, HostID: y1YAQN34aND22of5hFhwwOP4jZ6z3Yk11Tkb4l9xBbU1YSortDfqQA3ZAPJUHTj+35zitVN5Phk=, api error AccessDenied: Access Denied 10:44AM ERR no datastore error="unknown schema 's3' data store for path '/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339'" 10:44AM ERR reading from data store error="unknown schema 's3' data store for path '/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339'" path=/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339 10:45AM WRN timeout waiting for FUSE mount 10:45AM INF shutdown filesystem start 10:45AM INF shutdown filesystem done 10:45AM ERR creating .command.log file error="open /fusion/s3/data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue/0c/11dfda6a9cfec968a2f0c2c7008339/.command.log: no such file or directo executor > local (fusion enabled) (1) [0c/11dfda] process > method:processWf:method_process (run) [100%] 1 of 1, failed: 1 ✘ [- ] process > method:publishStatesSimpleWf:publishStatesProc -

Expected behavior and actual behavior

I would expect Nextflow to pick up my di AWS_PROFILE. I verified that my AWS PROFILE is indeed working, because when I don't use Fusion (input is on S3, output is on S3, work dir is local(!!)), Nextflow manages to fetch data from / publish data to my private S3 bucket:

script_nofusion.sh:

#!/bin/bash

# set AWS profile
export AWS_PROFILE=di

# run component
DATA=s3://data-intuitive-tmp/test-nextflow-wave-fusion/resources

NXF_VER=23.10.1 nextflow run \
  viash-io/test-nextflow-wave-fusion \
  -r main_build \
  -main-script target/nextflow/method/main.nf \
  -profile docker \
  -latest \
  --input $DATA/input1.txt \
  --multiple_input "$DATA/input1.txt;$DATA/input2.txt" \
  --publish_dir "s3://data-intuitive-tmp/test-nextflow-wave-fusion/output/github_issue"
Script output N E X T F L O W ~ version 23.10.1 Pulling viash-io/test-nextflow-wave-fusion ... Already-up-to-date Launching `https://github.com/viash-io/test-nextflow-wave-fusion` [jovial_ramanujan] DSL2 - revision: 33e8c0898e [main_build] executor > local (2) [79/0affac] process > method:processWf:method_process (run) [100%] 1 of 1 ✔ [68/5d6ea2] process > method:publishStatesSimpleWf:publishStatesProc (run) [100%] 1 of 1 ✔

When I manually define the aws.accessKey and the aws.secretKey in my nextflow config, the Nextflow workflow now works as intended:

script.sh:

#!/bin/bash

# do not set AWS profile
# export AWS_PROFILE=di

# create config
cat > /tmp/nextflow.config <<EOF
fusion {
    enabled = true
    exportStorageCredentials = true
}

aws {
    accessKey = "..."
    secretKey = "..."
}

wave {
    enabled = true
}
EOF

# run component
DATA=s3://data-intuitive-tmp/test-nextflow-wave-fusion/resources

NXF_VER=23.10.1 nextflow run \
  viash-io/test-nextflow-wave-fusion \
  -r main_build \
  -main-script target/nextflow/method/main.nf \
  -profile docker \
  -latest \
  -w "s3://data-intuitive-tmp/test-nextflow-wave-fusion/work/github_issue" \
  -c /tmp/nextflow.config \
  --input $DATA/input1.txt \
  --multiple_input "$DATA/input1.txt;$DATA/input2.txt" \
  --publish_dir "s3://data-intuitive-tmp/test-nextflow-wave-fusion/output/github_issue"
Script output N E X T F L O W ~ version 23.10.1 Pulling viash-io/test-nextflow-wave-fusion ... Already-up-to-date Launching `https://github.com/viash-io/test-nextflow-wave-fusion` [small_davinci] DSL2 - revision: 33e8c0898e [main_build] executor > local (fusion enabled) (2) [a6/387fb4] process > method:processWf:method_process (run) [100%] 1 of 1 ✔ [5e/ae32f3] process > method:publishStatesSimpleWf:publishStatesProc (run) [100%] 1 of 1 ✔

Steps to reproduce the problem

To reproduce this problem, you'll need access to a private S3 repository containing one or more files. This workflow simply copies the input file to the output, so you can use whatever file.

#!/bin/bash

# set AWS profile
export AWS_BUCKET=s3://path/to/your/s3
export AWS_PROFILE=your_aws_profile

# create config
cat > /tmp/nextflow.config <<EOF
fusion {
    enabled = true
    exportStorageCredentials = true
}

wave {
    enabled = true
}
EOF

# run component
DATA="$AWS_BUCKET/resources"

NXF_VER=23.10.1 nextflow run \
  viash-io/test-nextflow-wave-fusion \
  -r main_build \
  -main-script target/nextflow/method/main.nf \
  -profile docker \
  -latest \
  -w "$AWS_BUCKET/work" \
  -c /tmp/nextflow.config \
  --input "$AWS_BUCKET/resources/input1.txt" \
  --multiple_input "$AWS_BUCKET/resources/input1.txt;$AWS_BUCKET/resources/input2.txt" \
  --publish_dir "$AWS_BUCKET/output"

↑ change the AWS_BUCKET, AWS_PROFILE, --input, --multiple_input, and --publish_dir to point to a private S3 bucket with resources resources/input1.txt and resources/input2.txt.

Program output

See above.

.nextflow.log: .nextflow.log

Environment

marcodelapierre commented 5 months ago

Thanks for reporting @rcannood . Pinging @jordeu as this seems indeed a Fusion-related issue

jordeu commented 5 months ago

Fusion should pick up the profile when AWS_PROFILE is defined. The problem is that exportStorageCredentials = true does not add AWS_PROFILE to the environment variable exported into the container.

If this is the problem, meanwhile we don't fix exportStorageCredentials, one workaround should be to add this extra configuration:

docker.runOptions = '-e AWS_PROFILE=$AWS_PROFILE'

As an alternative to environment variables, there is also the option to mount the credentials file inside the containers:

docker.runOptions = '-v $HOME/.aws/credentials:/credentials -e AWS_SHARED_CREDENTIALS_FILE=/credentials'