Open ajagetia2001 opened 3 weeks ago
Reproduced this with the latest triton server image as well 24.08-trtllm-python-py3 Are there configurations that we have to set to get the failure rate
Failure count is always 0.
To reproduce this I have given this request payload where I removed the last parenthesis {"text_input": "What is machine learning?", "max_tokens": 512, "bad_words": "", "stop_words": "", "temperature": 0.0, "top_p": 1.0, "frequency_penalty": 0
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
deploy a llama3-7b model on triton server 2.46.0
Expected behavior
Expected is to get some failure rate in this metrics when nv_inference_request_failure when getting 5xx at the client side
actual behavior
Currently, this value is not getting updated. It is only showing zero even after the server is giving 5xx
additional notes
curl --location --request POST 'http://sampletritonmodel-triton.genai-a100-mh-prod.fkcloud.in/v2/models/ensemble/generate' \ --header 'Content-Type: application/json' {"error":"failed to parse the request JSON buffer: The document is empty. at 0"}%
After getting this error as well I am not getting failure metric count