triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
711 stars 108 forks source link

When the input contains end_id, the last character of output is repeated. #113

Open PAOPAO6 opened 1 year ago

PAOPAO6 commented 1 year ago

model: baichuan1 13b enable inflight_fused_batching

good case post: curl -X POST 10.60.133.200:8030/v2/models/ensemble/generate -d '{"max_tokens": 90, "bad_words": "", "stop_words": "", "text_input": "What is machine learning?"}'

reponse: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" What is machine learning?\nMachine learning is a branch of artificial intelligence that focuses on developing algorithms that can learn from data and improve performance over time. It is a subset of artificial intelligence that focuses on the development of algorithms that can learn from data and improve performance over time. Machine learning algorithms are used to identify patterns in data and make predictions based on those patterns.</s>100% of the"}

bad case post: curl -X POST 10.60.133.200:8030/v2/models/ensemble/generate -d '{"max_tokens": 90, "bad_words": "", "stop_words": "", "end_id": 2, "text_input": "What is machine learning?"}'

reponse: {"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"What is machine learning?\nMachine learning is a branch of artificial intelligence that focuses on developing algorithms that can learn from data and improve performance over time. It is a subset of artificial intelligence that focuses on the development of algorithms that can learn from data and improve performance over time. Machine learning algorithms are used to identify patterns in data and make predictions based on those patterns.."}

PAOPAO6 commented 1 year ago

@byshiue

BasicCoder commented 1 year ago

The latest main branch already supports the exclude_input_in_output parameter. If you are using an old version, may be you can reference this https://github.com/triton-inference-server/tensorrtllm_backend/pull/95. this code, the seq_len-1 can get the truth output.

PAOPAO6 commented 1 year ago

The latest main branch already supports the exclude_input_in_output parameter. If you are using an old version, may be you can reference this #95. this code, the seq_len-1 can get the truth output.

think you very much