grpc / grpc-java

The Java gRPC implementation. HTTP/2 based RPC
https://grpc.io/docs/languages/java/
Apache License 2.0
11.36k stars 3.82k forks source link

Reduce Thread count in default Event Loop Group #2123

Open carl-mastrangelo opened 8 years ago

carl-mastrangelo commented 8 years ago

By default, ELGs use 2x the number of threads as the number of processors. This appears to be for applications that do a lot of work on their net threads, which gRPC does not. Consider reducing ELG threads down to the number of processors.

buchgr commented 8 years ago

I guess I never understood why Netty by default creates 2 * core count-many threads. Netty never blocks, so why is there a need for more threads than hardware threads? 😅

carl-mastrangelo commented 8 years ago

We typically use 2 network threads internally, with the expectation that the most people will provide an executor. I propose the following:

2 Netty threads if not using direct $NUMBER_OF_CORES Netty threads if using direct.

@buchgr @ejona86 thoughts?

pgrosu commented 8 years ago

The reasoning behind 2*(number of processors) has to do with the Hyper-Threading Technology (HTT). Basically one processors appears as two logical processors, each having its own architectural state with the execution resources shared. Each logical processors can operate in single-task (ST) or multi-task (MT) modes, and idle periods of one thread allows the execution resources to be maximized. I can go into more details if necessary, but I think the following paper describes it nicely:

Hyper-Threading Technology Architecture and Microarchitecture.pdf

Hope it helps, ~p

ejona86 commented 8 years ago

The threads the OS exposes are virtual and already account for hyper threading. So if anything, we would divide by 2 to get real cores instead of the volatility of hyper threading.

I'm not too against 2 threads. The main problem with users providing their own is it is a pain to coordinate across all users of Channel. So server-side I think it would be easier to use just 2 (but yes, that doesn't get us far because we share the pool between client and server). How easy is it to override the executor internally? We could let the user choose statically, although that has its own problems.

pgrosu commented 8 years ago

Eric you're right, the Runtime.getRuntime().availableProcessors() returns the logical number of processors. Regarding the doubling of logical cores in Netty, I found the following issue referenced:

https://github.com/netty/netty/issues/3888

This is in lines 73-77 of PooledByteBufAllocator.java:

// Use 2 * cores by default to reduce condition as we use 2 * cores for the number of EventLoops // in NIO and EPOLL as well. If we choose a smaller number we will run into hotspots as allocation and // deallocation needs to be synchronized on the PoolArena. // See https://github.com/netty/netty/issues/3888 final int defaultMinNumArena = runtime.availableProcessors() * 2;

Paul

fabiofumarola commented 7 years ago

Just in the meantime to reduce the size of the thread pool you can do as follows:

    val server = NettyServerBuilder.forPort(port)
      .workerEventLoopGroup(new NioEventLoopGroup(16))