sponte / Powershell.Deployment

Powershell scripts that automated component deployment. Supports windows services (srvany.exe, NServiceBus, TopShelf) and websites.
MIT License
15 stars 8 forks source link

Improvement: Implement Mutex lock for IIS, so parallel deployments don't error but queue up naturally #32

Closed slawomir-brzezinski-at-interxion closed 5 years ago

slawomir-brzezinski-at-interxion commented 6 years ago

This has been proposed here as one way of speeding up massive deployments, and doing it relatively safely.

This should shave a good amount of time from massive parallel deployments, as the scripts would stop relying on coarse grained Mutex of deployment tools.

RavindraSolanki commented 5 years ago

Hi @sponte, @taliesins, I have a PR to mitigate the issue. Are you happy with the changes? If yes can this get merged?

slawomir-brzezinski-at-interxion commented 5 years ago

Must say I'm not a fan of preemptive retrying, without knowing what the error is. Not a fail-fast strategy. My proposal was a mutex, so that just failures due to parallel execution are prevented, i.e. just make the code thread-safe.

RavindraSolanki commented 5 years ago

Hi @slawomir-brzezinski-at-interxion, I don't think the mutex would help in its current implementation of the Powershell.Deployment. All those deployment scripts are local to the application being shipped. In your opinion, you assume creating a 'mutex' outside of the boundaries of the applications to restrict to a single action at a time in IIS. I don't think it is desirable.

slawomir-brzezinski-at-interxion commented 5 years ago

@RavindraSolanki , yes I was suggesting using a global mutex (one of inter-process synchronization primitives that operating systems offer, e.g. windows, posix), but giving it a name that will identify the intent to modify IIS configuration, and use it for as small piece of code as possible. Of course this assumes everything is deployed through the same code, but I think this is the only solution that does not add possibility of new, completely unknown, undesired consequences, without making developers aware (apart from IIS providing a thread-safe API).

RavindraSolanki commented 5 years ago

It might be workable with a mutex. But in the context of microservice apps, we might different apps with different versions of Powershell.Deployment. I'll try to get a mutex implemented by the retry mechanism is necessary too.

slawomir-brzezinski-at-interxion commented 5 years ago

Re:

in the context of microservice apps, we might different apps with different versions of Powershell.Deployment

The retry mechanism would also rely on being deployed on all the instances to truly make the problem go away, so I don't see a difference.

the retry mechanism is necessary too.

IMHO it is too opinionated of an approach for a generic piece of software. I would certainly not recommend a commercial use of a software that takes sweeping decisions on any errors without knowing what happenned (and making sure that someone resolves the question). A retry might actually make the problem worse, be it directly, or by allowing more time to pass without anyone looking at the problems it has caused.

RavindraSolanki commented 5 years ago

@slawomir-brzezinski-at-interxion is it again your 'fail fast' religion. In the case of IIS it might be recoverable. Taking into account that the guys from Octopus did think it is an acceptable one.

slawomir-brzezinski-at-interxion commented 5 years ago

@slawomir-brzezinski-at-interxion is it again your 'fail fast' religion.

:) let's try to stay away from statements irrelevant to the discussion at hand. I think I was giving concrete examples so I think that it's enough to say that, in your opinion, it is fine to ignore them without knowing what consequences such an action has, and it is fine to stealthily introduce it to a generic piece of software like this without making the user aware.

In the case of IIS it might be recoverable.

It might or it might not. In all cases, a human should attend a deployment and make the decision to retry. It's better to learn before making decision. Better yet, understand the error and fix it, so that it never happens.

Taking into account that the guys from Octopus did think it is an acceptable one.

Hmmm, really? Where? I am actually finding them agreeing with my 'religion':

Excerpt from this feature request:

Q: I would like to implement retry options in case of failure .... A: At the moment this is only possible through the use of the Guided Failure feature. This feature will stop the deployment once a step fails, allowing you to review it and then retry it, which is an approach we think is more appropriate than doing a blind retry

slawomir-brzezinski-at-interxion commented 5 years ago

Okay, after having a closer look at the changes, I can see that the retries are performed on actions that all seem to be idempotent, that is, retrying them should not worsen the situation in any way. Because of that, I'm no longer thinking that retrying implementation is a blocker, since I see that this PR makes the package only better.

Just for completeness, I do think mutex would offer better option even for the original pretext of this work - speeding up mass-deployments, since a mutex minimizes the time to wait for conflicts (by only waiting as much as needed, as opposed to the random between 2-10seconds, and reducing tries to just 1), and overall would serve a higher number of parallel deployment processes before hitting the 'give up' criteria (wait timeout for mutex, maximumNumberOfRetries for the retry solution).

taliesins commented 5 years ago

I like the idea of a mutex. I would be thinking more along the lines of a file lock vs interprocess communication.

There is one possible drawback to the mutex. What happens if something which does not use the mutex is changing the file.

So to be truly safe and because its a "transient error" I would vote for retry block. We might be able to shorten the retry period to something between 1 to 2 seconds. I would also only catch the specific error that we want to retry and not all errors.

@sponte I think it's time we give these 2 contributor access. They are one of the biggest users of this powershell library.

sponte commented 5 years ago

Totally agree. Go ahead.

Kind regards, Stanislaw Wozniak

On 18 Oct 2018, at 6:28 am, Taliesin Sisson notifications@github.com<mailto:notifications@github.com> wrote:

I like the idea of a mutex. I would be thinking more along the lines of a file lock vs interprocess communication.

There is one possible drawback to the mutex. What happens if something which does not use the mutex is changing the file.

So to be truly safe and because its a "transient error" I would vote for retry block. We might be able to shorten the retry period to something between 1 to 2 seconds.

@spontehttps://github.com/sponte I think it's time we give these 2 contributor access. They are one of the biggest users of this powershell library.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/sponte/Powershell.Deployment/issues/32#issuecomment-430879836, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAC5bKqraeZd_3Iyyc8H76sKbOFZMFquks5umBF9gaJpZM4SYK8K.

RavindraSolanki commented 5 years ago

I've made some changes to look the best of the two worlds.

For all the IIS actions run through Powershell.Deployments a mutex is used to synchronise the execution. However, I'm conscious something might happen outside Powershell.Deployment and a retry-mechanism is in place for this eventuality.

RavindraSolanki commented 5 years ago

Hey @taliesins, please can you give @slawomir-brzezinski-at-interxion and @RavindraSolanki the contributor privileges.

RavindraSolanki commented 5 years ago

To answer @taliesins about

I would also only catch the specific error that we want to retry and not all errors

, I've found out when trying to run at the same time multiple web-app deployments that it can fail at different places (e.g. AppPool, WebApp, setting a property of the earlier) with different error messages.

Else the second element 'Mutex or filelock', I think it does not matter.

slawomir-brzezinski-at-interxion commented 5 years ago

Closing, as it's fixed by https://github.com/sponte/Powershell.Deployment/pull/36 , that just got merged