Closed peaeater closed 1 week ago
maybe up to a hundred thousand requests per hour?
That's ~30 requests/s, which isn't a lot for a proxy to handle. The specs you listed should be more than sufficient to handle it.
A connection could not be established within the configured ConnectTimeout.
This indicates that YARP wasn't able to establish new connections with your backend destinations. This may indicate a networking/connectivity issue between your servers, or your destination servers not accepting connections for some reason. It's hard to provide more info from YARP as all we see is that connections aren't going through.
Does the issue only appear at high load? Can you establish new connections to backend servers via other means while YARP can't?
The issue only appears at high load, and has happened twice at about the same threshold (where "threshold" is 6 sites being proxied with ~30 requests/s).
The backend destinations are on the same server, also served through IIS, so there's no connectivity issue between servers here. As soon as the sites are un-proxied they respond normally, plus all of them act the same way when this problem occurs, so I don't suspect they are the problem. (All of them are asp.net 6 or 8 websites.)
It's also the case that when this problem occurs, and I do remove a site or sites from the proxy list, the problem doesn't immediately clear up, with the remaining sites responding normally through YARP. Instead, the same problem persists for quite some time. It's like there's some kind of resource exhaustion that takes a long time to clear up, even though YARP CPU and memory usage are relatively low. Restarting the YARP website, recycling its application pool, and/or killing its tasks in Task Manager doesn't help.
I should also mention that each of the proxied sites is being rate limited, with a non-partitioned fixed window rule of 75 requests per 10 seconds and a queue limit of 25. I can't imagine why that would matter, but who knows.
I should also mention that each of the proxied sites is being rate limited, with a non-partitioned fixed window rule of 75 requests per 10 seconds and a queue limit of 25. I can't imagine why that would matter, but who knows.
By non-partitioned, you mean that the limit applies to all clients together, or is it based on e.g. IP? Does the issue persist if you remove the rate limiting?
A connection could not be established within the configured ConnectTimeout.
This is a pretty generic error indicating that we weren't able to establish new connections. Do you have any corresponding logs from the backend servers indicating why they're not accepting connections?
By non-partitioned, you mean that the limit applies to all clients together, or is it based on e.g. IP?
Non-partitioned meaning the limit applies to all clients. YARP returns 429 status codes when a client has hit the rate limit.
Does the issue persist if you remove the rate limiting?
We didn't try that and aren't going to. Frankly, the primary purpose of the reverse proxy is to implement rate limiting.
Do you have any corresponding logs from the backend servers indicating why they're not accepting connections?
Well, no. That's the conundrum - the back end web applications ARE able to accept connections while YARP is logging 504 errors, if, for instance, one makes requests to them via a local hostname that bypasses the reverse proxy.
Would you be able to capture a network trace while this is happening (Wireshark)?
Closing this one as not actionable from our side at the moment. Please feel free to reopen if you're able to collect more info / create a minimal repro.
Describe the bug
We have slowly added sites to a YARP proxy on IIS for the past few weeks. The downstream sites are also running on IIS on the same machine. After adding a sixth site, YARP crapped the bed and returned constant 504 Gateway Timeout errors for all sites being proxied. The traffic it was handling was relatively intense for us (maybe up to a hundred thousand requests per hour?) but the reverse proxy site didn't appear to be sweating in terms of CPU or memory usage.
Many .NET Runtime exceptions are logged to the Windows Application event log in association with the bed-crapping. An example is below.
Does this mean the Windows machine is underpowered? Would setting connection timeouts help? Would setting minimum thread pool counts help? Just not sure where to start, and the YARP documentation doesn't appear to address anything like this.
Further technical details
YARP 2.2.0-preview.1.24266.1 Windows Server 2022 Standard 10.0.20348 Build 20348 24.0 GB RAM Intel Xeon Silver 4208 CPU @ 2.10 GHz, 2100 Mhz, 2 Cores, 2 Logical Processors x 4