seqeralabs / nf-tower

Nextflow Tower system
https://tower.nf
Mozilla Public License 2.0
144 stars 51 forks source link

CannotStartContainerError #305

Open adomingues opened 3 years ago

adomingues commented 3 years ago

Hi all,

I don't know if this is the right place to ask, but here it goes. I have set-up a compute profile following the AWS batch instructions, including the AIM forge policies, but stumbled into an error while running the nextflow-io/hello pipeline:

 Workflow execution completed unsuccessfully
The exit status of the task that caused the workflow execution to fail was: -
CannotStartContainerError: Error response from daemon: failed to initialize logging driver: failed to create Cloudwatch log stream: RequestError: send request failed
caused by: Post https://logs.us-east-2.amazonaws.com/: dial tcp 10.20.10.16:443: i/o time

From the AWS console I can tell that an instance is initially created but then it somehow fails with the above error.

FYI I am an AWS n00b but this was done with our AWS IT support who set-up all the permission as we went along. The only "deviation" from the docs while setting up the compute environment, was that a new VPC (+Subnet and security group) was created specifically for nf-tower jobs.

Thank you for your developing nf-tower and your help!

pditommaso commented 3 years ago

Hello, I assume the Cloudwatch permissions have been granted to the launching user.

The error is quite unusual, the message dial tcp 10.20.10.16:443: i/o time makes me think something related to vpc or security groups settings. Have you specified you custom VPC Id and security groups in the Tower CE (compute env) advanced settings (bottom of the page)

adomingues commented 3 years ago

Hi @pditommaso. You were spot on with your hint. I did specify custom VPC id and security groups in advanced settings. After writing the bug report I went back to tower, created a new compute environment changing the VPC to our default VPC (I was using a custom VPC set for tower-nf only), started an new job, and voilá:

N E X T F L O W  ~  version 21.04.0
Pulling nextflow-io/hello ...
downloaded from https://github.com/nextflow-io/hello.git
Launching `nextflow-io/hello` [insane_visvesvaraya] - revision: e6d9427e5b [master]
Monitor the execution with Nextflow Tower using this url https://tower.nf/watch/22sKKNRjAteTJt
[75/4ce5c8] Submitted process > sayHello (3)
[b1/87d22a] Submitted process > sayHello (4)
[4c/97c51e] Submitted process > sayHello (1)
[67/fcef84] Submitted process > sayHello (2)
Hola world!
Hello world!
Bonjour world!
Ciao world!

Since I have your attention one related questions

image

Cheers!

(Feel free to close the issue as it's solved)

pditommaso commented 3 years ago

started an new job, and voilá:

good

There is an option to delete those that are successfully created but not the "bad ones"

There's should be a delete button. is there not?

adomingues commented 3 years ago

The "delete" button is only present for those environments that were successfully created. In the screenshot above the env mike-test-4-vpc has it (I used it to launch a job), but the button is not present in AWS batch launch environment which failed because of permissions.

pditommaso commented 3 years ago

I think we have identified a glitch in the UI. We are patching soon

On Wed, 19 May 2021, 13:44 A. Domingues, @.***> wrote:

The "delete" button is only present for those environments that were successfully created. In the screenshot above the env mike-test-4-vpc has it (I used it to launch a job), but the button is not present in AWS batch launch environment which failed because of permissions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/seqeralabs/nf-tower-aws/issues/1#issuecomment-844022714, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGHOSGN2TO7U2OXS47LSD3TOOQCDANCNFSM45EDJRUA .