tinkerbell / tink

Workflow Engine for provisioning Bare Metal
https://tinkerbell.org
Apache License 2.0
937 stars 135 forks source link

Provisioner stays active if the worker fails. #106

Closed thomcrowe closed 4 years ago

thomcrowe commented 4 years ago

When the datacenter didn’t have hosts for both provisioner and worker, it spent over 3 minutes creating the provisioner before failing; the worker wasn’t started. After it failed, the provisioner was still there until I ran terraform apply with the new datacenter details. If the worker fails, it would be nice for the provisioner to shut down.

parauliya commented 4 years ago

@thomcrowe , I don't agree with your point of failing provisioner if the worker fails since, provisioner is the one who is responsible of creating workflows for different workers. In case of multiple workers there can be multiple worklfows for different workers. Therefore if one worker fails we can't fail provisioner because it has workflows for other workers which still needs to be run.

Please let me know if I understood your concern correctly and close the issue else provide a bit of explaination of what you are trying to say.

grahamc commented 4 years ago

Is this when using the terraform-based quick start? In that case, I think it would be reasonable to set the expectation that users are familiar with Terraform ahead of time. Given that expectation, it is reasonable that a user would understand they needed to destroy the provisioner.

I can imagine a case where a provision fails for an unrelated reason. If that unrelated reason is retryable, tearing down the other machine would add a significant slowdown to the process of getting started.

thomcrowe commented 4 years ago

Makes sense.

Thanks y'all!

On Fri, May 22, 2020 at 9:21 AM Graham Christensen notifications@github.com wrote:

Is this when using the terraform-based quick start? In that case, I think it would be reasonable to set the expectation that users are familiar with Terraform ahead of time. Given that expectation, it is reasonable that a user would understand they needed to destroy the provisioner.

I can imagine a case where a provision fails for an unrelated reason. If that unrelated reason is retryable, tearing down the other machine would add a significant slowdown to the process of getting started.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tinkerbell/tink/issues/106#issuecomment-632718140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4XPCVPPKSV2DVXOABJHEDRS2C53ANCNFSM4M7BSMVA .