Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side

ajagetia2001 commented 3 weeks ago

System Info

CPU Architecture x86_64
GPU - A100-80GB
CUDA version - 11
Tensorrt LLM version : 0.9.0
Triton server version - 2.46.0
model : Llama3-7b

Who can help?

No response

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

deploy a llama3-7b model on triton server 2.46.0

Expected behavior

Expected is to get some failure rate in this metrics when nv_inference_request_failure when getting 5xx at the client side

actual behavior

Currently, this value is not getting updated. It is only showing zero even after the server is giving 5xx

additional notes

curl --location --request POST 'http://sampletritonmodel-triton.genai-a100-mh-prod.fkcloud.in/v2/models/ensemble/generate' \ --header 'Content-Type: application/json' {"error":"failed to parse the request JSON buffer: The document is empty. at 0"}%
After getting this error as well I am not getting failure metric count

ajagetia2001 commented 1 week ago

Reproduced this with the latest triton server image as well 24.08-trtllm-python-py3 Are there configurations that we have to set to get the failure rate

ajagetia2001 commented 1 week ago

Failure count is always 0.

To reproduce this I have given this request payload where I removed the last parenthesis {"text_input": "What is machine learning?", "max_tokens": 512, "bad_words": "", "stop_words": "", "temperature": 0.0, "top_p": 1.0, "frequency_penalty": 0

triton-inference-server / tensorrtllm_backend