npuichigo / openai_trtllm

OpenAI compatible API for TensorRT LLM triton backend
MIT License
176 stars 27 forks source link

all option is same as openai? #48

Open dongs0104 opened 4 months ago

dongs0104 commented 4 months ago

when i use n option is different as openai.

when i use n it turn to use beam search.

npuichigo commented 4 months ago

Sorry, I think I misunderstand the n in trtllm, where I have expected multiple beam would be returned. According to this thread, https://github.com/triton-inference-server/tensorrtllm_backend/issues/499, maybe I need to make multiple requests to return n samples.

npuichigo commented 4 months ago

By the way, do you know what choice.index would be like when using stream along with n>1?

dongs0104 commented 4 months ago

thanks for your hard work, i will run it on openai, than attach result :)

dongs0104 commented 4 months ago

By the way, do you know what choice.index would be like when using stream along with n>1?

when i use Open AI API it return n == 2 and stream=True

data: {'choices':[{"delta":{"role":"assistant"}, "finish_reason":null, "index":0}]}
data: {'choices':[{"delta":{"role":"assistant"}, "finish_reason":null, "index":1}]}
data: {'choices':[{"delta":{"content":"A"}, "finish_reason":null, "index":0}]}
data: {'choices':[{"delta":{"content":"A"}, "finish_reason":null, "index":0}]}
...
data: {'choices':[{"delta":{"content":"B"}, "finish_reason":null, "index":1}]}
data: {'choices':[{"delta":{"content":"B"}, "finish_reason":null, "index":1}]}
...
data: [DONE]
dongs0104 commented 3 weeks ago

@npuichigo they support num_return_sequences on v0.14.0 over

npuichigo commented 3 weeks ago

I did not find this option in https://platform.openai.com/docs/api-reference/chat/create