[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)
Reproduction
curl -X POST localhost:8000/v2/models/tensorrt_llm_bls/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "", "temperature": 100.0, "top_k": 100}'
output
{"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],"text_output":"- Quora\nMachine learning is a field of artificial intelligence which enables machines to learn without being specifically"}
Expected behavior
given temperature 100.0 and top_k 100, one would expect a nonsensical (and not the canonical) answer
actual behavior
see above, the reproduction part
additional notes
Ensemble model works as expected. I have sent the following request to the same running engine just a few seconds after the tensorrt_llm_bls request above:
curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "", "temperature": 100.0, "top_k": 100}'
System Info
Triton + TRT-LLM 0.9.0, llama2 70b model, fp8 quantization, run on 2xH100 80GB, tp 2, pp 1 config.pbtxt for tensorrt_llm_bls (otherwise unchanged):
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
output
Expected behavior
given temperature 100.0 and top_k 100, one would expect a nonsensical (and not the canonical) answer
actual behavior
see above, the reproduction part
additional notes
Ensemble model works as expected. I have sent the following request to the same running engine just a few seconds after the tensorrt_llm_bls request above:
output: