microsoft / reverse-proxy

A toolkit for developing high-performance HTTP reverse proxy applications.
https://microsoft.github.io/reverse-proxy
MIT License
8.44k stars 831 forks source link

YARP has a higher cpu usage than Nginx #2427

Open doddgu opened 6 months ago

doddgu commented 6 months ago

Sorry, I don't know if it is a bug.

Describe the bug

I deployed 3 nginx at HongKong, and deployed 3 YARP at HangZhou.

Client -> Nginx -> Yarp -> Service

Nginx forwards some services, and YARP forward one of them.

Nginx CPU image

YARP CPU image

YARP other metrics image

Htop (Cat.Service.dll is based on YARP) image

I tried to analyze the CPU on vs Top function image

Module View image

To Reproduce

No exception.

Further technical details

They're all 4c8g, YARP on ubuntu 22.04, nginx on centos. YARP 2.1.0 runs on .NET 8.

Tratcher commented 6 months ago

How does the load / RPS compare?

doddgu commented 6 months ago

How does the load / RPS compare?

Every YARP is almost 4000 image

doddgu commented 6 months ago

I loaded pdb. I find that the Thread in WorkerThreadStart method. The Thread.CurrentThread.SetThreadPoolWorkerThreadName() takes up a lot of CPU resources.

I don't know why have to call WorkerThreadStart so many times.

image

image

image

doddgu commented 6 months ago

I used YARP source code analysis, I found that YARP itself does not have high cpu usage.

image

doddgu commented 6 months ago

Hi @MihaZupan , any news?

doddgu commented 6 months ago

Is it related to the https://github.com/dotnet/runtime/issues/70098 And I see there's pr to fix it

doddgu commented 1 month ago

@MihaZupan hi,is there any news? In my case, I have a service , it has 120,000 qps. It only need 3 nginx, but used 40 yarp services. It troubles me. I tried using.net 9 and I found a performance improvement of about 20%, but that's still a big difference. Or are there any temporary ways to try to fix the problem? I'm happy to test it.

zhenlei520 commented 1 month ago

The performance gap is so obvious, is there any room for improvement?

zhenlei520 commented 3 weeks ago

How does the load / RPS compare?

Is there any news about this issue? Through observations over the past few days, we found that when the response time of downstream services fluctuates, Porxy is under great pressure. Simply put, requests that originally required 100 threads to process require more threads to process these requests due to downstream fluctuations. At this time, threads are piled up, and then more threads are quickly started to process these requests. However, this rapid change of threads in a short period of time causes obvious CPU fluctuations, and as the downstream stabilizes, threads that have not been used for a long time will be destroyed. In this way, downstream fluctuations will have a great impact on Proxy. Although we set the minimum number of threads, this will not prevent the thread pool from recycling threads later. It only enables more threads to be started quickly. We hope to keep these threads alive all the time, and do not want frequent thread startups to cause large CPU fluctuations.

 ThreadPool.SetMinThreads(500, 500);

@Tratcher @MihaZupan

zhenlei520 commented 3 weeks ago

image image image

doddgu commented 1 week ago

We upgrade .NET 8 to .NET 9 preview, and set some envionment variables

The most obvious improvement in .NET 9 is half the memory

DOTNET_SYSTEM_NET_SOCKETS_THREAD_COUNT = 500
DOTNET_ThreadPool_UnfairSemaphoreSpinLimit = 0
DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS = 1

Overall, it indeed consumes less CPU (around 30% less), and there are no longer minute-level blockages causing widespread timeouts when the Current Request suddenly increases. However, there is still a small probability of request timeouts, and the frequency of CPU fluctuations has become very frequent. We tracked that the downstream service responds quickly, and occasionally requests timeout due to yarp, but because the QPS is relatively high, these timeouts are not visible on the dashboard. We have another upstream service that is particularly sensitive to abnormal requests, and in the upstream service, we see that requests with a small probability of timeout occur very frequently.

First, let's look at the performance of yarp, which has indeed improved. image

These are abnormal requests detected upstream, all of which are SocketExceptions. image

In summary: Setting thread-related parameters can reduce CPU usage but will introduce more instability, and there is still a significant gap compared to Nginx.

zhenlei520 commented 3 days ago

Later we made some adjustments to the configuration

<PropertyGroup>
  <TargetFramework>net9.0</TargetFramework>
 <GarbageCollectionAdaptationMode>0</GarbageCollectionAdaptationMode>
</PropertyGroup>

Environment variables

DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0

After turning off spin, the CPU performance increased by nearly 40%, which is indeed a big improvement. However, according to the data, it will affect qps. However, we have not yet added link monitoring, so the impact on qps is not yet known. From the perspective of upstream requests, the average response time is not greatly affected.

image

However, compared with nginx, yarp still has a lot of room for improvement. We hope to use it instead of other reverse proxy products.