[WIP] ONNX model - Githubissues

This PR adds onnx model as well as related scripts . The ONNX model is 5x faster than Pytorch on python client. 15.7 ms vs 59.8 ms

The BERT->ONNX scripts can be found at rapids_triton_example/pytorch_to_onnx/pytorch_to_onnx.ipynb

perf_analyzer -m end_to_end_onnx -b 8 --shape product_reviews:1 --string-data "This product is the greatest of all time. I really recommend it .  Product is really nice. Cant recommend it enough" -i grpc --async

*** Measurement Settings ***
  Batch size: 8
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using asynchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
  Client: 
    Request count: 880
    Throughput: 1408 infer/sec
    Avg latency: 5615 usec (standard deviation 90 usec)
    p50 latency: 5590 usec
    p90 latency: 5736 usec
    p95 latency: 5802 usec
    p99 latency: 5922 usec
    Avg gRPC time: 5608 usec ((un)marshal request/response 3 usec + response wait 5605 usec)
  Server: 
    Inference count: 8448
    Execution count: 1056
    Successful request count: 1056
    Avg request latency: 5406 usec (overhead 68 usec + queue 46 usec + compute 5292 usec)

  Composing models: 
  rapids_tokenizer, version: 1
      Inference count: 1056
      Execution count: 1056
      Successful request count: 1056
      Avg request latency: 941 usec (overhead 1 usec + queue 30 usec + compute input 5 usec + compute infer 889 usec + compute output 16 usec)

  sentiment_onnx_model, version: 1
      Inference count: 1056
      Execution count: 1056
      Successful request count: 1056
      Avg request latency: 4456 usec (overhead 58 usec + queue 16 usec + compute input 112 usec + compute infer 4263 usec + compute output 7 usec)

Inferences/Second vs. Client Average Batch Latency
Concurrency: 1, throughput: 1408 infer/sec, latency 5615 usec

rapidsai / rapids-examples

[WIP] ONNX model #39