Open yunfeng-scale opened 1 year ago
stop_words is implemented in TRT-LLM, seems like it's not sent to the model? https://github.com/triton-inference-server/tensorrtllm_backend/blob/release/0.5.0/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt
I encountered the same problem.
@yunfeng-scale try to add parameter: "end_id": 2
not work for baichuan2
@yunfeng-scale try to add parameter: "end_id": 2
not work for baichuan2
@yunfeng-scale try to add parameter: "end_id": 2
Hi, @xiaoFine ,I'm deploying the baichuan2-13B, and encountered the same error as you. As what @UncleFB suggested, I have solved this problem by adding "end_id" when building the request body.
{ "text_input": "input", "max_tokens": 500, "bad_words": "", "stop_words": "", "end_id": 2 }
Could you try on latest main branch https://github.com/triton-inference-server/tensorrtllm_backend/tree/main, the commit is https://github.com/triton-inference-server/tensorrtllm_backend/commit/37ed967216bdfa0ffce038d368675c93966172ea.
Hi, it doesn't seem like "stop_words" is respected in the generate endpoint.
I'm getting the same output with and without this field
not sure if I should supply a list so tried that as well