triton-inference-server / openvino_backend

OpenVINO backend for Triton.
BSD 3-Clause "New" or "Revised" License
29 stars 16 forks source link

Add inference support to the backend #4

Closed tanmayv25 closed 3 years ago

tanmayv25 commented 3 years ago

I was able to resolve the issues and run perf_analyzer on the models.

# ./qa/clients/perf_analyzer -m  resnet50_fp16_openvino
*** Measurement Settings ***
  Batch size: 1
  Measurement window: 5000 msec
  Using synchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
  Client: 
    Request count: 378
    Throughput: 75.6 infer/sec
    Avg latency: 13220 usec (standard deviation 1531 usec)
    p50 latency: 12491 usec
    p90 latency: 14745 usec
    p95 latency: 16421 usec
    p99 latency: 18464 usec
    Avg HTTP time: 13191 usec (send/recv 141 usec + response wait 13050 usec)
  Server: 
    Inference count: 455
    Execution count: 455
    Successful request count: 455
    Avg request latency: 12285 usec (overhead 56 usec + queue 26 usec + compute input 370 usec + compute infer 11761 usec + compute output 72 usec)

Inferences/Second vs. Client Average Batch Latency
Concurrency: 1, throughput: 75.6 infer/sec, latency 13220 usec