Open dongs0104 opened 4 months ago
Sorry, I think I misunderstand the n
in trtllm, where I have expected multiple beam would be returned. According to this thread, https://github.com/triton-inference-server/tensorrtllm_backend/issues/499, maybe I need to make multiple requests to return n
samples.
By the way, do you know what choice.index would be like when using stream
along with n>1
?
thanks for your hard work, i will run it on openai, than attach result :)
By the way, do you know what choice.index would be like when using
stream
along withn>1
?
when i use Open AI API it return n == 2 and stream=True
data: {'choices':[{"delta":{"role":"assistant"}, "finish_reason":null, "index":0}]}
data: {'choices':[{"delta":{"role":"assistant"}, "finish_reason":null, "index":1}]}
data: {'choices':[{"delta":{"content":"A"}, "finish_reason":null, "index":0}]}
data: {'choices':[{"delta":{"content":"A"}, "finish_reason":null, "index":0}]}
...
data: {'choices':[{"delta":{"content":"B"}, "finish_reason":null, "index":1}]}
data: {'choices':[{"delta":{"content":"B"}, "finish_reason":null, "index":1}]}
...
data: [DONE]
@npuichigo they support num_return_sequences
on v0.14.0 over
I did not find this option in https://platform.openai.com/docs/api-reference/chat/create
when i use
n
option is different as openai.when i use n it turn to use beam search.