Closed Slyne closed 3 years ago
The avg queue time does seem wrong. Can you be more specific about what you think is wrong in the other times.
@Slyne I must fix the terminology of in the report. It is not actual total avg compute input, compute infer, compute output time and queue time; but the components seen by the ensemble scheduler. Because there is no queue in the ensemble and the incoming request is directly proceeds to the first step(first composing model) ,the queue time is reported as zero.
@deadeyegoodwin I should probably remove the Total term to prevent the confusion. These numbers will then be just for the model being loaded (quartznet-ensemble
), which is an ensemble in this case followed by the composing models.
Description I used ensemble model, which consists of three models: 2 trt + 1pt + 1 custom backend. However the output is quite strange:
Just wanna ask if the output is normal ? And what does this mean to have zero queue time ? (The average compute and infer time also doesn't seem right)
Triton Information What version of Triton are you using? 20.10
Are you using the Triton container or did you build it yourself? nvcr.io/nvidia/tritonserver:20.10-py3
Expected behavior Expect the avg queue time to be a reasonable figure.