huserben / TfsExtensions

Extensions for TFS 2015+ such as custom Widgets (require TFS 2017) and Build Tasks
MIT License
44 stars 22 forks source link

VSTS unreliablity #98

Closed daughey closed 5 years ago

daughey commented 5 years ago

This is an enhancement request to deal with VSTS sometimes dropping connection (even from its own hosted agents).

Could you please implement automatic retry when awaiting queued builds? VSTS has a bad habit of giving an ECONNRESET or other error that means VSTS dropped the ball. It would be good if the Trigger Build Task could automatically retry upon connection errors instead of failing on the first error.

huserben commented 5 years ago

Hi @daughey

Thanks for this input.

The task is using the vso-node-api azure-devops-node-api (https://github.com/Microsoft/azure-devops-node-api) to establish the connection to the server. I will check whether they support some retry options out of the box that could be used.

Do you happen to have a log of a sample of such a dropped connection for me to refer to?

I will let you know once I know more.

huserben commented 5 years ago

Hi again @daughey

just to clarify, which version of the task are you using exactly?

arnochauveau commented 5 years ago

@huserben i'm having the same issue on version 3.*

I get the following log:

digital-client-app-dvlp_-_18_11_2_-_pipelines

If there's anything else I can do to help you resolve this issue, please let me know!

huserben commented 5 years ago

Hi @arnochauveau

thanks for the log, I'm currently looking into the issue and how to properly solve it.

Could you tell me the configuration of the wait time between the checks that you are using?

arnochauveau commented 5 years ago

The wait time is currently set to 30s for all our build triggers.

huserben commented 5 years ago

Ok, what you could do for now as a workaround if it is a severe problem for you is trying to increase that time to not put too much load on the VSTS instance.

arnochauveau commented 5 years ago

We'll try that. Thanks for the tip!

huserben commented 5 years ago

Hi all

quick update, I'll implemented a retry-mechanism so in case of an error in a request made to the server it will retry up to 4 times before it fails the task.

I need to do some further testing but I expect the updated version to be available somewhen tomorrow.

daughey commented 5 years ago

Looking forward to it!

Sorry I missed your responses - victim of the "focused" inbox in Outlook.

In response to your question (which probably doesn't matter anymore), I'm on version 3.0.3. Samples of failed responses have been cleaned up by retention policy. If you're still interested, I can post it if happens again.

huserben commented 5 years ago

Hi @daughey @arnochauveau

I just uploaded a new version of all the tasks that contains my change with the retry. However I wans't really able to test this in my environment because I didn't manage to force the error to happen. I would propose that I'll leave the issue open for a while and you can give me feedback whether it seems fixed (meaning it didn't happen anymore) or if it still fails.

Every retry is logged and as well the error that happened will be put into the log, so you should be able to see whether it tried mutliple times and if it succeeded in a later request after the first one.

Please let me know if it still fails (ideally with some log message output) or if you see that it works now by doing the retry of the request so I could close the issue.

Thanks both of you for reporting this and helping to improve the task.

arnochauveau commented 5 years ago

We will take it for a spin in our Develop environment. Although after increasing the wait time to 60s, we haven't had any more problems.

arnochauveau commented 5 years ago

I've lowered the wait time to 1s and it timed out as excpected. Your retry-mechanism seems to be working nicely. Issue can be closed as far as I'm concerned.

digital-client-app-dvlp_-_18_11_10_-_pipelines

huserben commented 5 years ago

Ok, thank you very much for the super fast feedback.

I'll therefore close the issue. @daughey if you feel like it's not solved for you please do reopen it.

Thanks again both of you for the feedback and please let me know via a new issue if you have any other suggestions for improving the task.

daughey commented 5 years ago

Excellent. Thank you very much - both of you!