triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
653 stars 93 forks source link

Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side #582

Open ajagetia2001 opened 3 weeks ago

ajagetia2001 commented 3 weeks ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

deploy a llama3-7b model on triton server 2.46.0

Expected behavior

Expected is to get some failure rate in this metrics when nv_inference_request_failure when getting 5xx at the client side

actual behavior

Currently, this value is not getting updated. It is only showing zero even after the server is giving 5xx

additional notes

curl --location --request POST 'http://sampletritonmodel-triton.genai-a100-mh-prod.fkcloud.in/v2/models/ensemble/generate' \ --header 'Content-Type: application/json' {"error":"failed to parse the request JSON buffer: The document is empty. at 0"}%
After getting this error as well I am not getting failure metric count

ajagetia2001 commented 1 week ago
Screenshot 2024-09-04 at 12 21 11 PM

Reproduced this with the latest triton server image as well 24.08-trtllm-python-py3 Are there configurations that we have to set to get the failure rate

ajagetia2001 commented 1 week ago
Screenshot 2024-09-04 at 12 22 53 PM

Failure count is always 0.

To reproduce this I have given this request payload where I removed the last parenthesis {"text_input": "What is machine learning?", "max_tokens": 512, "bad_words": "", "stop_words": "", "temperature": 0.0, "top_p": 1.0, "frequency_penalty": 0