triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.14k stars 1.46k forks source link

It is true? #7415

Open kadisyy opened 3 months ago

kadisyy commented 3 months ago

I just want to confirm a question:

my model has 1000 params, but the infer duration just about 20us , i use promSql like this:

`avg(rate(nv_inference_compute_infer_duration_us{app=~"$app", env="${env}"}[2m]) / rate(nv_inference_count{app=~"$app", env="${env}"}[2m])) by (model, instance_name)

`

I feel the duration is too short;

it is true?

oandreeva-nv commented 3 months ago

Hi @kadisyy, it is a bit hard to answer the question without the context.

I would recommend timing your model's run outside the triton to verify timing you have.