Closed wanzhenchn closed 1 month ago
Hi. I don't get your point. Could you explain more? It is better to share the end to end reproduced steps.
Hi. I don't get your point. Could you explain more? It is better to share the end to end reproduced steps.
How to post sample parameters: top_k
and temperature
to data JSON dict of requests.post() for triton server?
I found that the server only accepts top_p
and repetition_penalty
, if I forcefully pass in top_k
and temperature
, the output results will be the same as the input prompt. @byshiue
import request
response = requests.post(
url="http://localhost:8000/v2/models/ensemble/generate",
data=json.dumps({
"max_tokens": 128,
"top_p": 0.95,
"repetition_penalty": 1.15,
"stream": True,
"text_input": "what is the machine learning?"
}),
stream=False
)
I have test different parameters on llama model and get different results:
$ curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "", "pad_id": 2, "end_id": 2}'
{"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"Machine learning is a subfield of artificial intelligence that focuses on the development of algorithms that can learn"}
$ curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "", "pad_id": 2, "end_id": 2, "top_k": 16}'
{"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"The answer to this question depends on who you ask and when it was asked.\nIt has its"}
$ curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "", "pad_id": 2, "end_id": 2, "top_k": 16, "repetition_penalty": 2.0}'
{"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"Why do we need it and how can you use in your applications\nThe term “machine” often"}
You could check that do you really pass the parameters correctlly. We have supported python backend now and you could switch to python backend by changing https://github.com/triton-inference-server/tensorrtllm_backend/blob/ae52bce3ed8ecea468a16483e0dacd3d156ae4fe/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt#L28 to
backend: "python"
and add debug messages at https://github.com/triton-inference-server/tensorrtllm_backend/blob/ae52bce3ed8ecea468a16483e0dacd3d156ae4fe/all_models/inflight_batcher_llm/tensorrt_llm/1/model.py#L120 to check the input parameters.
Many thanks for yore response.
@byshiue When I pass the top_p
and repetition_penalty
, the error occurred:
{"error":"failed to parse the request JSON buffer: Missing a name for object member. at 196"}
Removing the comma in {}
solved the problem.
System Info
GPU A30 (32GB)
Who can help?
@byshiue @schetlur-nv
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
How to post
top_k
andtemperature
for triton http server? The response is the same as input prompt.Expected behavior
No
actual behavior
No
additional notes
No