Changed sync inference call with async call – that boost throughput with multi concurrency
Added option to configure more parameters – performance_hint and numeric value of NUM_STREAMS. That way it is possible to tune the performance to the load
Added documentation and example how to configure triton for low and high concurrency load
Added and documented all OV frontends – before only IR format was supported
Changed sync inference call with async call – that boost throughput with multi concurrency
Added option to configure more parameters – performance_hint and numeric value of NUM_STREAMS. That way it is possible to tune the performance to the load
Added documentation and example how to configure triton for low and high concurrency load
Added and documented all OV frontends – before only IR format was supported