grpc / grpc-swift

The Swift language implementation of gRPC.
Apache License 2.0
2.04k stars 420 forks source link

gRPC server uses only single CPU #992

Closed bugrayilmazz closed 4 years ago

bugrayilmazz commented 4 years ago

Describe the bug

We have a client-side streaming gRPC service which executes some comptutationally heavy operations. Although we use many threads to stream to the service simultaneously, we see that it does not utilize multiple CPUs available in the server. It only uses a single CPU. This causes the service to process the data slower than we expect.

We are using the latest version of grpc-swift (1.0.0-alpha.20).

We are using the following to start our service (simplified):

import Foundation
import GRPC
import NIO
import SwiftProtobuf
import MyServerLib

func main() throws {
    let group = MultiThreadedEventLoopGroup(numberOfThreads: System.coreCount)
    defer {
        try! group.syncShutdownGracefully()
    }

    let provider: MyServerProvider
    do {
        provider = try MyServerProvider()
    } catch let error { 
        print("\(error.localizedDescription)")
        return
    }

    let server = Server.insecure(group: group)
        .withServiceProviders([provider])
        .bind(host: "localhost", port: 7877)

    server.map {
        $0.channel.localAddress
    }.whenSuccess { address in
        print("server started on port \(address!.port!)")
    }

    _ = try server.flatMap {
        $0.onClose
    }.wait()
}

try main()

We are compiling the server code and running our service on Ubuntu 18.04. We monitor the CPU usage with htop.

To reproduce

  1. Build a client-side streaming gRPC service like the one described at https://github.com/grpc/grpc-swift/tree/main/Sources/Examples/RouteGuide
  2. Compile and run the server on Ubuntu 18.04.
  3. Use multiple threads to send requests to the server simultaneously.
  4. Monitor the CPU usage on the machine where you run the server.

Expected behaviour

We expect that the server utilizes all the CPU cores available on the machine since it realizes computationally expensive operations. Ideally, we would see high CPU utilization for all the cores.

Additional information

We are using Swift 5.2.

glbrntt commented 4 years ago

This is the intended design. A server will typically handle many connections concurrently, each connection will have its own thread, and requests on that connection will be handled on that thread.

If your service provider has to do computationally heavy work then you should offload it onto a NIOThreadPool.

bugrayilmazz commented 4 years ago

@glbrntt we understand that each connection has its own thread. But why should it prevent the threads utilizing all the CPUs?

Here is how we implement our server (simplified):

import Foundation
import GRPC
import NIO
import NIOConcurrencyHelpers
import Logging

class MyServerProvider: MyStreamingServiceProvider {
    init() {
        // Initializer
    }

    // processData function is called when a client starts streaming
    func processData(context: UnaryResponseCallContext<MyResponse>) -> EventLoopFuture<(StreamEvent<MyData>) -> Void> {
        return context.eventLoop.makeSucceededFuture(
                { event in
                    switch event {
                    case let .message(message):

                        // Computationally heavy operations

                    case .end:

                        // Prepare result and return

                    }
                })
    }
}
glbrntt commented 4 years ago

If each connection is bound to a single thread how could that thread utilise all CPU cores?

bugrayilmazz commented 4 years ago

@glbrntt I think there is a misunderstanding. Maybe my description was not clear enough, sorry for that.

We don't want a single connection to utilize multiple CPUs. Our server is supposed to handle many connections at the same time. Let's say 30 clients are streaming to our server simultaneously. In that case, there will be 30 threads handling these connections. Every connection is bound to a single thread. But the server itself does not distribute these connections to multiple CPUs. It ends up using only one CPU for handling all 30 connections. That was our concern.

glbrntt commented 4 years ago

That makes much more sense! What does System.coreCount return?

bugrayilmazz commented 4 years ago

@glbrntt it returns 36.

glbrntt commented 4 years ago

What does your client code look like? Are you using grpc-swift?

bugrayilmazz commented 4 years ago

@glbrntt we are using Python clients to test the server. It's based on a generic client code that we use for testing other similar gRPC services. We are using a ThreadPool to create multiple clients. Each thread streams a different audio file to the server.

bugrayilmazz commented 4 years ago

@glbrntt btw, we added a computationally expensive, dummy calculation into the unary service of the server provided in the RouteGuide example. When we run this modified server on Ubuntu 18.04 and send requests to the server from multiple clients (I tested with 32 clients) simultaneously, it utilizes only a single CPU with 100% utilization. However, it does not scale onto multiple CPUs available in the machine.

If it will help, I can share the code to reproduce this.

glbrntt commented 4 years ago

I can't reproduce this. I'm not familiar with the python client but my assumption is that the size of the thread pool doesn't map to the number of connections.

Can you enable debug logging on the server and grep for "determined http version"? We'll only log that per connection, so we can use it as a proxy for number of accepted connections.

bugrayilmazz commented 4 years ago

@glbrntt how can I enable debug logging on the server?

glbrntt commented 4 years ago
import Logging

...

var logger = Logger(label: "grpc", factory: StreamLogHandler.standardOutput)
logger.logLevel = .debug

let server = Server.insecure(group: group)
    .withLogger(logger)
    .withServiceProviders([provider])
    .bind(host: "localhost", port: 7877)
bugrayilmazz commented 4 years ago

@glbrntt thanks.

When I run the modified RouteGuide server for some time with 32 clients sending requests simultaneously, I get only the following log with grep "determined http version":

2020-10-14T16:33:34+0300 debug grpc : http_version=http2 grpc_connection_id=4C98F958-A7EC-4DE2-ABFB-AE4BE6F9502F remote_address=[IPv4]127.0.0.1/127.0.0.1:42506 determined http version
glbrntt commented 4 years ago

Thanks. One log line means one connection: your 32 clients are all using the same connection.

Regardless of this, the comment I left earlier is still my recommendation here:

If your service provider has to do computationally heavy work then you should offload it onto a NIOThreadPool.

bugrayilmazz commented 4 years ago

@glbrntt thank you for your help!

We tried starting each client in different processes and we got many new connection logs from the server. Server utilizes all the CPUs when we do this.

So, it seems that we did have a problem with our client implementation. Server works fine.