Open mgoodfellow opened 1 week ago
Hi @youssefhassan
Seeing a spike again in error rates of 503s, with the delayed connect error 113:
Also seeing spikes in response times across most endpoints.
Seems to be recovering again, the spike was from 16:48 -> 17:26 UTC time. Response times are now recovering as well.
I'm keeping an eye on that and I really appreciate the reporting. It helps a lot. I will keep this thread open and please share whenever you see spikes of 500s and hopefully we will find a fix soon
Hi,
Related to: https://github.com/soundcloud/api/issues/311
We are still seeing an increased error rate on the API:
These started around the 20th August and have been ongoing.
The errors are in one of 2 forms:
upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: 113
OR
upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: 111
Retries often work, and they are sporadic and spikey - so we generally see the errors clustered:
We are seeing these errors on Tracks, Reposts, Profiles primarily. It affects both read-only (GET) and mutation endpoints (PUT / POST etc).
Would be good to understand these as we don't retry mutations, only idempotent reads.