pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.23k stars 864 forks source link

Netty Threads and Performance #1700

Closed Hegelim closed 2 years ago

Hegelim commented 2 years ago

I have read configurations here https://pytorch.org/serve/configuration.html I am wondering do number_of_netty_threads and netty_client_threads have anything to do with performance (throughput)? When I start my TorchServe model using config.properties that looks like below

Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
load_models=ABINet.mar
models={\
  "ABINet": {\
    "1.0": {\
        "defaultVersion": true,\
        "marName": "ABINet.mar",\
        "runtime": "python",\
        "minWorkers": 3,\
        "maxWorkers": 8,\
        "batchSize": 200,\
        "maxBatchDelay": 7000,\
        "responseTimeout": 120,\
        "number_of_netty_threads": 8,\
        "max_request_size": 65535000\
    }\
  }\
}

The model started properly. However, the model has 0 for number_of_netty_threads and netty_client_threads, what might be causing the issue? image

msaroufim commented 2 years ago

Those 2 need a better name

The default is the number of logical cores available to JVM for both which is a reasonable default that maximizes throughput, increasing it more may cause thread oversubscription which will tank performance

What the ideal configuration should be is often a benchmark and see approach, you can use https://github.com/pytorch/serve/tree/master/benchmarks#auto-benchmarking-with-apache-bench to help you out

Hegelim commented 2 years ago

Thanks for your reply! Any idea why my netty threads is 0 even though I set it explicitly in config.properties?

msaroufim commented 2 years ago

Since there's a new issue about this will close this for now

ozancaglayan commented 2 years ago

Hi,

What would be the best practice in terms of setting these two parameters along with the number of worker processes for the model? Lets say we have 1 model and 4 physical CPUs. Is it better to have the sum of number_of_netty_threads, netty_client_threads and the default_number_of_workers to be <=4? Or for the threads, since they are much lightweight, we dont care about potential oversubscription and we can set 4 worker processes, a couple of client and netty threads aside?