koltyakov / gosip

⚡️ SharePoint SDK for Go
https://go.spflow.com
MIT License
140 stars 32 forks source link

How to handle 'An existing connection was forcibly closed by the remote host' #28

Closed swapnilgaonkar closed 4 years ago

swapnilgaonkar commented 4 years ago

We are facing below error on couple of our domains from last 2 weeks. Its a known error and inconsistent one. "read tcp 10.230.22.132:55103->20.190.132.116:443: wsarecv: An existing connection was forcibly closed by the remote host."

We are already using 'RetryPolicies' mechanism for handling error messages. Example RetryPolicies: map[int]int{ // merged with default policies 500: 2, // overwrites default 404: 5, // overwrites default }

Can you help us to identify how to handle above error message in retry logic

koltyakov commented 4 years ago

I'm afraid that retry policies won't work here (at least as is currently), as they rely on HTTP Status code responses, but not the network connectivity issues like that.

What is a fingerprint of the issue? I know that you mentioned that it's inconsistent and it's 100% network related, maybe load balancer or a routing device issue, or a proxy which fails time to time, but probably you can detect a pattern?

I checked some logs for processes that constantly talk to different environments and can't find any message with the same error. Not sure how to reproduce to think out a strategy.

Btw, what are the authentication strategies and SharePoint versions?

Also, as you started experiencing this from the last 2 weeks, did any weird things happen to the environment, SharePoint farm, network, deployment locations, firewalls, policies, etc.?

swapnilgaonkar commented 4 years ago

Thanks for the reply

We are working on SharePoint Online and authenticating strategy SAML.

We didn't observe any weird thing in our environment from last 2 weeks. This looks like issue from Microsoft side.

Failed to find any pattern in the failure. Different apis failed at different times with the same error.

We are planning to handle this error(based on error string returned by library) by using some internal retry logic with delay.

We can close this thread for now

koltyakov commented 4 years ago

Going to close. Please reopen if you face this again, however, looks that it was an external factor, also the environment-specific. Such is not always feasible to troubleshoot or workaround. Sometimes can as it was with 503 and random silent failures in MS end.

Recently was facing an ADFS behind WAP environment where any configuration changes or audit was not available, and time-to-time 403's appeared. Ended up adding retries on 403 and some logic in OnRetry hook.