Based on watsonx requirements, we should make available these metrics, at least:
'# of inference requests over defined time period
Avg. response time over defined time period
'# of successful / failed inference requests over defined time period
Compute utilization (CPU,GPU,Memory)
However, users won't find metrics with the same name and some of them need to be computed by combination. Examples:
failed inference requests over defined time period: you must do sth like tgi_batch_inference_count-tgi_batch_inference_success plus adding the time period syntax
Memory consumption: there isn't a specific istio/tgi/caikit metric for it (at least, i didn't find it). I thought users can compute it with sth similar to: sum(container_memory_working_set_bytes{pod='<isvc_predictor_pod_name>',namespace='<isvc_namespace>',container='',}) BY (pod, namespace)
Moreover, there are additional metrics which deserves to be documented, like tgi_request_generated_tokens_count
Based on watsonx requirements, we should make available these metrics, at least:
However, users won't find metrics with the same name and some of them need to be computed by combination. Examples:
tgi_batch_inference_count-tgi_batch_inference_success
plus adding the time period syntaxsum(container_memory_working_set_bytes{pod='<isvc_predictor_pod_name>',namespace='<isvc_namespace>',container='',}) BY (pod, namespace)
Moreover, there are additional metrics which deserves to be documented, like
tgi_request_generated_tokens_count