Open PAOPAO6 opened 1 year ago
@byshiue
The latest main branch already supports the exclude_input_in_output
parameter. If you are using an old version, may be you can reference this https://github.com/triton-inference-server/tensorrtllm_backend/pull/95. this code, the seq_len-1
can get the truth output.
model: baichuan1 13b enable inflight_fused_batching
good case post:
curl -X POST 10.60.133.200:8030/v2/models/ensemble/generate -d '{"max_tokens": 90, "bad_words": "", "stop_words": "", "text_input": "What is machine learning?"}'
reponse:
{"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" What is machine learning?\nMachine learning is a branch of artificial intelligence that focuses on developing algorithms that can learn from data and improve performance over time. It is a subset of artificial intelligence that focuses on the development of algorithms that can learn from data and improve performance over time. Machine learning algorithms are used to identify patterns in data and make predictions based on those patterns.</s>100% of the"}
bad case post:
curl -X POST 10.60.133.200:8030/v2/models/ensemble/generate -d '{"max_tokens": 90, "bad_words": "", "stop_words": "", "end_id": 2, "text_input": "What is machine learning?"}'
reponse:
{"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"What is machine learning?\nMachine learning is a branch of artificial intelligence that focuses on developing algorithms that can learn from data and improve performance over time. It is a subset of artificial intelligence that focuses on the development of algorithms that can learn from data and improve performance over time. Machine learning algorithms are used to identify patterns in data and make predictions based on those patterns.."}