Closed VibhuJawa closed 2 years ago
This PR adds onnx model as well as related scripts . The ONNX model is 5x faster than Pytorch on python client. 15.7 ms vs 59.8 ms
15.7 ms
59.8 ms
The BERT->ONNX scripts can be found at rapids_triton_example/pytorch_to_onnx/pytorch_to_onnx.ipynb
perf_analyzer -m end_to_end_onnx -b 8 --shape product_reviews:1 --string-data "This product is the greatest of all time. I really recommend it . Product is really nice. Cant recommend it enough" -i grpc --async
*** Measurement Settings *** Batch size: 8 Using "time_windows" mode for stabilization Measurement window: 5000 msec Using asynchronous calls for inference Stabilizing using average latency Request concurrency: 1 Client: Request count: 880 Throughput: 1408 infer/sec Avg latency: 5615 usec (standard deviation 90 usec) p50 latency: 5590 usec p90 latency: 5736 usec p95 latency: 5802 usec p99 latency: 5922 usec Avg gRPC time: 5608 usec ((un)marshal request/response 3 usec + response wait 5605 usec) Server: Inference count: 8448 Execution count: 1056 Successful request count: 1056 Avg request latency: 5406 usec (overhead 68 usec + queue 46 usec + compute 5292 usec) Composing models: rapids_tokenizer, version: 1 Inference count: 1056 Execution count: 1056 Successful request count: 1056 Avg request latency: 941 usec (overhead 1 usec + queue 30 usec + compute input 5 usec + compute infer 889 usec + compute output 16 usec) sentiment_onnx_model, version: 1 Inference count: 1056 Execution count: 1056 Successful request count: 1056 Avg request latency: 4456 usec (overhead 58 usec + queue 16 usec + compute input 112 usec + compute infer 4263 usec + compute output 7 usec) Inferences/Second vs. Client Average Batch Latency Concurrency: 1, throughput: 1408 infer/sec, latency 5615 usec
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
This PR adds onnx model as well as related scripts . The ONNX model is 5x faster than Pytorch on python client.
15.7 ms
vs59.8 ms
The BERT->ONNX scripts can be found at rapids_triton_example/pytorch_to_onnx/pytorch_to_onnx.ipynb