triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
706 stars 106 forks source link

Param "stop_words" not respected in v2/models/ensemble/generate endpoint #57

Open yunfeng-scale opened 1 year ago

yunfeng-scale commented 1 year ago

Hi, it doesn't seem like "stop_words" is respected in the generate endpoint.

I'm getting the same output with and without this field

curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "branch"}'
{"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"<s> What is machine learning? Machine learning is a branch of artificial intelligence that allows computers to learn without being explicitly programmed. Machine"}

not sure if I should supply a list so tried that as well

curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": ["branch"]}'
{"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"<s> What is machine learning? Machine learning is a branch of artificial intelligence that allows computers to learn without being explicitly programmed. Machine"}
yunfeng-scale commented 1 year ago

stop_words is implemented in TRT-LLM, seems like it's not sent to the model? https://github.com/triton-inference-server/tensorrtllm_backend/blob/release/0.5.0/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt

UncleFB commented 1 year ago

I encountered the same problem.

UncleFB commented 1 year ago

@yunfeng-scale try to add parameter: "end_id": 2

xiaoFine commented 1 year ago

not work for baichuan2

@yunfeng-scale try to add parameter: "end_id": 2

Luis-xu commented 1 year ago

not work for baichuan2

@yunfeng-scale try to add parameter: "end_id": 2

Hi, @xiaoFine ,I'm deploying the baichuan2-13B, and encountered the same error as you. As what @UncleFB suggested, I have solved this problem by adding "end_id" when building the request body. { "text_input": "input", "max_tokens": 500, "bad_words": "", "stop_words": "", "end_id": 2 }

byshiue commented 12 months ago

Could you try on latest main branch https://github.com/triton-inference-server/tensorrtllm_backend/tree/main, the commit is https://github.com/triton-inference-server/tensorrtllm_backend/commit/37ed967216bdfa0ffce038d368675c93966172ea.