Closed kgutwin closed 4 years ago
Hi @kgutwin !
Sorry to hear you are having trouble submitting sarek on AWS. It has been a while since I ran sarek on aws like this (We now use the cloud formation scripts found here and submit from tower.nf).
Just to clarify: Are you logged into your EC2 instance? Which AMI are you using right now?
The last time we did it, we used this command:
nextflow run nf-core/sarek -profile docker -r 2.6.1 \
--outdir 's3://our-bucket/results_dir' \
-w 's3://our-bucket/workdir' \
--tracedir 's3://our-bucket/trace_' \
--input 's3://our-bucket/input.tsv' \
--genome 'GRCh38' \
--tools 'Strelka,ASCAT,snpEff' \
-c awsbatch.config \
--awsregion 'us-east-1' \
--igenomes_base 's3://our-bucket/references' \
--awscli '/home/ec2-user/miniconda/bin/aws' \
-resume
In AWSBatch.config
contains the queue information and so on in our case:
process {
queue = normal
withName:MapReads {
queue=highmem
}
withName:BamQC {
queue = long
}
}
process.executor = 'awsbatch'
aws.region = params.awsregion
executor.awscli = params.awscli
FYI: If you want to use igenomes later: The bucket is in availability zone eu-west. It looks like you want to run your analysis in us-east. You will need to copy the references to your availability zone, because as far as I know nextflow currently can't run with both.
For all I see, I think you may need to specify where your aws cli
lives. The aws cli
is not installed with sarek, but needs to be set up beforehand, when you set up your AMI. You can check whether it is there by running:
aws --version
To install it:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -p /home/ec2-user/miniconda
/home/ec2-user/miniconda/bin/conda install -c conda-forge awscli
/home/ec2-user/miniconda/bin/aws --version
I am not 100% sure anymore, if afterwards you also need to run:
aws configure
and set up your credentials or not.
Does this solve it?
It looks like I missed the step of creating the custom AMI, which I discovered in the main Nextflow docs. I'll pursue that - thanks!
Sounds good. 👍 Shout if it doesn't work afterwards
I also encounter this problem. As you know, AWSBatch start instances (by AMI with AWS CLI), while instances start docker container(image: nfcore/sarek:2.6.1). The problem is this image doesn't include awscli to sync files.
A simple workaround is to build a new image based on nfcore/sarek:2.6.1
by installing awscli. However, it would be better to be installed in the official image. (don't forget nfcore/sarekvep)
Hi @XLuyu !
This would be something that is not just related to sarek but would be relevant to all nf-core pipelines. So maybe something worth discussing with the @nf-core/core .
In the meantime you could also try tower forge. It will set up the resources for you and you don't have to worry about aws cli anymore. In addition, you can easily supervise all your runs. See here: https://help.tower.nf/docs/compute-envs/aws-batch/
Sorry, you are having troubles with AWS Batch. Hopefully, this help a bit 🙂
With a fresh install of Nextflow v20.10.0, I am trying to run the Sarek test pipeline using the awsbatch executor. My command line is:
Nextflow is able to submit jobs to the Batch queue, but they all fail with the message
This is happening because Nextflow is specifying the following command to the container when it starts:
When I launch the
nfcore/sarek:2.6.1
Docker container manually, I can see that the/home
directory is empty, and the AWS CLI does not seem to be installed anywhere.Should the AWS CLI be added to the list of packages installed by Conda? Or am I expected to build a custom container image including this tool? The Sarek documentation on AWS Batch implies a custom AMI, which doesn't seem to make sense in this case.
Thanks for your help!