tinkerbell / tink

Workflow Engine for provisioning Bare Metal
https://tinkerbell.org
Apache License 2.0
917 stars 134 forks source link

Improved logic for worker connections #196

Closed nathangoulding closed 4 years ago

nathangoulding commented 4 years ago

Description

There are a couple issues manifesting themselves which need resolution:

Why is this needed

Checklist:

I have:

parauliya commented 4 years ago

Hi @nathangoulding, There are few things I would like to discuss about :

  1. I think adding gRPC stream between tink-worker and tink-server is not a good solution because of the following reasons

           -   This will convert the protocol between worker and server from stateless to stateful which will add the condition based on stream connectivity.
           -   If there are multiple workers then server has to maintain sessions for each worker.
           -   This will complicate things between tink-worker and tink-server.
           -   Performance wise stream does not give any benefit.
  2. For this issue we can do few optimistion and fix the sleep so that terminal should not filled up with too many logs.

  3. When fetching workflow tink should only return workflows that are in progress or pending state and that too for that particular worker. This change is very much required at the server side.

  1. There is one more thing which is not there in this issue but we are planning for it which is as follows: As per the current implementation the tink-worker container on theworker machine exits as soon as it executes all the tasks assigned to it but we would like that container to keep running and polling to the server if there is a new workflow created for it so that it should start executing that wrokflow automatically. I would like to raise an issue for this enhancement and then do it after the fix of this issue.

Please let me know your views on the same.

nathangoulding commented 4 years ago

Responses inline:

This will convert the protocol between worker and server from stateless to stateful which will add the condition based on stream connectivity.

If there is an established session it will use it; otherwise, it will wait for a worker to fetch it like it does today.

If there are multiple workers then server has to maintain sessions for each worker.

Correct, this is how it works with osie-runner and hegel in production today.

This will complicate things between tink-worker and tink-server.

I'm not sure I understand this justification.

Performance wise stream does not give any benefit.

The overhead that will be eliminated by converting from polling to a gRPC stream is:

In addition, actions are started immediately instead of delayed until after the polling period.

nathangoulding commented 4 years ago

Responses to the bullets inline:

For this issue we can do few optimistion and fix the sleep so that terminal should not filled up with too many logs.

We should convert to a gRPC stream, not use polling.

When fetching workflow tink should only return workflows that are in progress or pending state and that too for that particular worker. This change is very much required at the server side.

Yes, correct.

There is one more thing which is not there in this issue but we are planning for it which is as follows: As per the current implementation the tink-worker container on the worker machine exits as soon as it executes all the tasks assigned to it but we would like that container to keep running and polling to the server if there is a new workflow created for it so that it should start executing that wrokflow automatically. I would like to raise an issue for this enhancement and then do it after the fix of this issue.

I'm fine to break this issue into a separate task, i.e. tink-worker container adds a parameter that causes it to exit on completion, or wait for additional actions.