Background context
We have recently started to see errors logged when services seemingly cannot set keys in the redis configuration cache fast enough. This always seems to correlate with periods of high load. We do not see the same problem with the concurrency cache, which is managed in a similar way
The error surfaced in logs is in the format:
Exception occured while saving response in cache, return data from API.
Error: 'The timeout was reached before the message could be written to the output buffer, and it was not sent, command=SETEX, timeout: 5000, inst: 0, qu: 0, qs: 0, aw: False, bw: CheckingForTimeout, rs: NotStarted, ws: Initializing, in: 0, last-in: 0, cur-in: 0, sync-ops: 0, async-ops: 1, serverEndpoint: ********, conn-sec: n/a, mc: 1/1/0, mgr: 10 of 10 available, clientName: **********(SE.Redis-v2.6.86.49666),
IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=60,Free=32707,Min=2,Max=32767),
POOL: (Threads=60,QueuedItems=6,CompletedItems=149190775),
v: 2.6.86.49666 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
Only the properties API sees this problem.
Specification
Issue likely in Reapit.Packages.ConfigurationClient
Try to replicate what might be going on here. The configuration service itself appears to be returning the data, but the cached values aren't getting set. This shouldn't be resulting in any issues with the response payloads as there are fallback mechanisms in this package, but we do need to understand what's going on here. The issue seems to have got progressively worse over the last 6 months, which is likely down to increased volume to that service
A number of articles online point to possible thread starvation
The cache size we use should be able to handle 65,000 connected clients, so I don't think the problem is with the cache itself
Background context We have recently started to see errors logged when services seemingly cannot set keys in the redis configuration cache fast enough. This always seems to correlate with periods of high load. We do not see the same problem with the concurrency cache, which is managed in a similar way
The error surfaced in logs is in the format:
Only the properties API sees this problem.
Specification