Deepseek model streaming mode with Chinese character �?

activezhao commented 3 months ago

System Info

CPU x86_64

GPU NVIDIA L20

TensorRT branch: v0.8.0

CUDA: NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.3

Who can help?

@kaiyux @byshiue @schetlur-nv

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

I use the flowing command.

python /data/tensorrt_llm/examples/llama/convert_checkpoint.py --model_dir /data/deepseek-coder-6.7b-base/ \
                            --output_dir /data/deepseek-coder-6.7b-base-tp2-bs32 \
                            --dtype float16 \
                            --tp_size 2 \
                            --workers 2

trtllm-build --checkpoint_dir /data/deepseek-coder-6.7b-base-tp2-bs32 \
            --output_dir /data/trt-engines-deepseek-coder-6.7b-base-bs32/2-gpu/  \
            --gemm_plugin float16 \
            --max_input_len 8192 \
            --max_output_len 1024 \
            --gpt_attention_plugin float16 \
            --max_batch_size 32

Here is the the request.

curl -X POST localhost:8000/v2/models/ensemble/generate_stream -d '{"text_input": "package gtin\n//2\n//外用液体剂\n//2018-08-15 16:12:50\n//3\n//颗粒剂\n//2018-08-15 16:12:50\n//4\n//注射剂\n//2018-08-15 16:12:50\n//5\n//口服散剂\n//2018-08-15 16:12:50\n//6\n//滴丸剂\n//2018-08-15 16:12:50\n//7\n//灌肠剂\n//2018-08-15 16:12:50\n//8\n//栓剂\n//2018-08-15 16:12:50\n//9\n//缓释控释剂型\n//2018-08-15 16:12:50\n//10\n//缓控释颗粒剂\n//2018-08-15 16:12:50\n//11\n//乳膏剂\n//2018-08-15 16:12:50\n//12\n//贴剂\n//2018-08-15 16:12:50\n//13\n//外用冻干制剂\n//2018-08-15 16:12:50\n//14\n//吸入剂\n//2018-08-15 16:12:50\n//15\n//凝胶剂\n//2018-08-15 16:12:50\n//16\n//片剂\n//2018-08-15 16:12:50\n//17\n//局部用散剂\n//2018-08-15 16:12:50\n//18\n//溶液剂\n//2018-08-15 16:12:50\n//19\n//胶囊剂\n//2018-08-22 17:49:54\n//20\n//胶丸剂\n//2018-12-20 15:20:56\n\n// DosageFormMap 剂型\nvar DosageFormMap = map[int]string{1: \"口服常释剂型\", 2: \"外用液体剂\", ", "max_tokens": 50, "bad_words": "", "stop_words": "", "stream": true, "temperature": 0.6, "return_log_probs": true, "generation_logits": true}'

But now I find that the Chinese in the inference results is garbled.

Expected behavior

The output result is normal.

actual behavior

Some Chinese characters in the inference results are garbled.

data: {"context_logits":0.0,"cum_log_probs":-0.0000637789344182238,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":":"}

data: {"context_logits":0.0,"cum_log_probs":-0.00006509023660328239,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" \""}

data: {"context_logits":0.0,"cum_log_probs":-0.00014126786845736206,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"颗"}

data: {"context_logits":0.0,"cum_log_probs":-0.0001422215427737683,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"粒"}

data: {"context_logits":0.0,"cum_log_probs":-0.0001441288914065808,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"剂"}

data: {"context_logits":0.0,"cum_log_probs":-0.0001535464689368382,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"\","}

data: {"context_logits":0.0,"cum_log_probs":-0.00470391009002924,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" "}

data: {"context_logits":0.0,"cum_log_probs":-0.004715234972536564,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"4"}

data: {"context_logits":0.0,"cum_log_probs":-0.004716307856142521,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":":"}

data: {"context_logits":0.0,"cum_log_probs":-0.004718453623354435,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" \""}

data: {"context_logits":0.0,"cum_log_probs":-0.004728944040834904,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"注"}

data: {"context_logits":0.0,"cum_log_probs":-0.00472989771515131,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"射"}

data: {"context_logits":0.0,"cum_log_probs":-0.004731328226625919,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"剂"}

data: {"context_logits":0.0,"cum_log_probs":-0.004758389201015234,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"\","}

data: {"context_logits":0.0,"cum_log_probs":-0.017934530973434449,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" "}

data: {"context_logits":0.0,"cum_log_probs":-0.01794406771659851,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"5"}

data: {"context_logits":0.0,"cum_log_probs":-0.017945021390914918,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":":"}

data: {"context_logits":0.0,"cum_log_probs":-0.017946213483810426,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" \""}

data: {"context_logits":0.0,"cum_log_probs":-0.0179499089717865,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389,-0.000003695494797284482],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"口"}

data: {"context_logits":0.0,"cum_log_probs":-0.017950862646102907,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389,-0.000003695494797284482,-9.536747711536009e-7],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"服"}

data: {"context_logits":0.0,"cum_log_probs":-0.017952173948287965,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389,-0.000003695494797284482,-9.536747711536009e-7,-0.0000013113030945532956],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"散"}

data: {"context_logits":0.0,"cum_log_probs":-0.01795324683189392,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389,-0.000003695494797284482,-9.536747711536009e-7,-0.0000013113030945532956,-0.0000010728841743912199],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"剂"}

data: {"context_logits":0.0,"cum_log_probs":-0.017961472272872926,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389,-0.000003695494797284482,-9.536747711536009e-7,-0.0000013113030945532956,-0.0000010728841743912199,-0.000008225474630307872],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"\","}

data: {"context_logits":0.0,"cum_log_probs":-0.04243664816021919,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389,-0.000003695494797284482,-9.536747711536009e-7,-0.0000013113030945532956,-0.0000010728841743912199,-0.000008225474630307872,-0.024475175887346269],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" "}

data: {"context_logits":0.0,"cum_log_probs":-0.042443323880434039,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389,-0.000003695494797284482,-9.536747711536009e-7,-0.0000013113030945532956,-0.0000010728841743912199,-0.000008225474630307872,-0.024475175887346269,-0.0000066757424974639438],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"6"}

data: {"context_logits":0.0,"cum_log_probs":-0.04244427755475044,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389,-0.000003695494797284482,-9.536747711536009e-7,-0.0000013113030945532956,-0.0000010728841743912199,-0.000008225474630307872,-0.024475175887346269,-0.0000066757424974639438,-9.536747711536009e-7],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":":"}

data: {"context_logits":0.0,"cum_log_probs":-0.04244594648480415,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389,-0.000003695494797284482,-9.536747711536009e-7,-0.0000013113030945532956,-0.0000010728841743912199,-0.000008225474630307872,-0.024475175887346269,-0.0000066757424974639438,-9.536747711536009e-7,-0.0000016689314179529902],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":" \""}

data: {"context_logits":0.0,"cum_log_probs":-0.04246931150555611,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389,-0.000003695494797284482,-9.536747711536009e-7,-0.0000013113030945532956,-0.0000010728841743912199,-0.000008225474630307872,-0.024475175887346269,-0.0000066757424974639438,-9.536747711536009e-7,-0.0000016689314179529902,-0.000023365293600363658],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"滴"}

data: {"context_logits":0.0,"cum_log_probs":-0.04247026517987251,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389,-0.000003695494797284482,-9.536747711536009e-7,-0.0000013113030945532956,-0.0000010728841743912199,-0.000008225474630307872,-0.024475175887346269,-0.0000066757424974639438,-9.536747711536009e-7,-0.0000016689314179529902,-0.000023365293600363658,-9.536747711536009e-7],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"�"}

data: {"context_logits":0.0,"cum_log_probs":-0.04247121885418892,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389,-0.000003695494797284482,-9.536747711536009e-7,-0.0000013113030945532956,-0.0000010728841743912199,-0.000008225474630307872,-0.024475175887346269,-0.0000066757424974639438,-9.536747711536009e-7,-0.0000016689314179529902,-0.000023365293600363658,-9.536747711536009e-7,-9.536747711536009e-7],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"�"}

data: {"context_logits":0.0,"cum_log_probs":-0.04247396066784859,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[-0.00006270605081226677,-0.0000010728841743912199,-0.0000013113030945532956,-0.00007617763913003728,-9.536747711536009e-7,-0.0000019073504518019037,-0.000009417578439752106,-0.004550363402813673,-0.000011324947081448045,-0.0000010728841743912199,-0.000002145769485650817,-0.000010490472959645558,-9.536747711536009e-7,-0.0000014305124977909146,-0.000027060874344897458,-0.013176142238080502,-0.000009536788638797589,-9.536747711536009e-7,-0.0000011920935776288389,-0.000003695494797284482,-9.536747711536009e-7,-0.0000013113030945532956,-0.0000010728841743912199,-0.000008225474630307872,-0.024475175887346269,-0.0000066757424974639438,-9.536747711536009e-7,-0.0000016689314179529902,-0.000023365293600363658,-9.536747711536009e-7,-9.536747711536009e-7,-0.0000027418175250204514],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"剂"}

additional notes

I have some suspicion that there is a problem with character conversion after decoding?

Hope there is a way to solve it.

Thanks.

handoku commented 3 months ago

It's not a bug, its a limitation of the tokenizer. Some character need two token_ids to represent, you have to decode them togather. I'm not sure whether the latest tensorrt_llm_bls enhanced streaming decode, but I did a quick-fix workround by decoding token_ids on client side.

activezhao commented 3 months ago

It's not a bug, its a limitation of the tokenizer. Some character need two token_ids to represent, you have to decode them togather. I'm not sure whether the latest tensorrt_llm_bls enhanced streaming decode, but I did a quick-fix workround by decoding token_ids on client side.

Hi @handoku Could you please tell me how to solve this problem?

In fact, I have found someone also met this problem, and they suggested to use bls to solve it, but I do not know the solution's detail.

Thanks so much.

handoku commented 3 months ago

It's not a bug, its a limitation of the tokenizer. Some character need two token_ids to represent, you have to decode them togather. I'm not sure whether the latest tensorrt_llm_bls enhanced streaming decode, but I did a quick-fix workround by decoding token_ids on client side.

Hi @handoku Could you please tell me how to solve this problem?

In fact, I have found someone also met this problem, and they suggested to use bls to solve it, but I do not know the solution's detail.

Thanks so much.

try set accumulate_tokens following this, but it seems that it will decode all generated tokens again when every single new token_id shows up.

activezhao commented 3 months ago

It's not a bug, its a limitation of the tokenizer. Some character need two token_ids to represent, you have to decode them togather. I'm not sure whether the latest tensorrt_llm_bls enhanced streaming decode, but I did a quick-fix workround by decoding token_ids on client side.

Hi @handoku Could you please tell me how to solve this problem?

In fact, I have found someone also met this problem, and they suggested to use bls to solve it, but I do not know the solution's detail.

Thanks so much.

try set accumulate_tokens following this, but it seems that it will decode all generated tokens again when every single new token_id shows up.

@handoku Got it, thank u so much, I will try it.

activezhao commented 3 months ago

It's not a bug, its a limitation of the tokenizer. Some character need two token_ids to represent, you have to decode them togather. I'm not sure whether the latest tensorrt_llm_bls enhanced streaming decode, but I did a quick-fix workround by decoding token_ids on client side.

Hi @handoku Could you please tell me how to solve this problem? In fact, I have found someone also met this problem, and they suggested to use bls to solve it, but I do not know the solution's detail. Thanks so much.

try set accumulate_tokens following this, but it seems that it will decode all generated tokens again when every single new token_id shows up.

Hi @handoku I learned what the function of accumulate_tokens is.

The BLS model has an optional parameter accumulate_tokens which can be used in streaming mode to call the postprocessing model with all accumulated tokens, instead of only one token. This might be necessary for certain tokenizers.

tensorrt_llm_bls/config.pbtxt

parameters: {
  key: "accumulate_tokens"
  value: {
    string_value: "${accumulate_tokens}"
  }
}

I set accumulate_tokens to true, but it does not work, � still occurred.

Could you please give me more suggestions?

Thanks.

parameters: {
  key: "accumulate_tokens"
  value: {
    string_value: "true"
  }
}

handoku commented 3 months ago

have you tried seeding request with stream=false? to confirm that its a tokenizer's decoding issue or accuracy issue.

activezhao commented 3 months ago

have you tried seeding request with stream=false? to confirm that its a tokenizer's decoding issue or accuracy issue.

@handoku Yes, if stream=false, the Chinese in the inference results will not be garbled.

But, I need to use streaming mode.

In fact, I tried accumulate_tokens mode, and the garbled character can be replaced, but in streaming mode, I have to give one normal word, not all the response every time.

data: {"context_logits":0.0,"cum_log_probs":-0.35588228702545168,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[-0.009281360544264317,-0.0006639180355705321,-0.0002992600784637034,-0.0067407069727778439,-0.000007033372639853042,-0.00038464312092401087,-0.0027929767966270448,-0.04418620467185974,-0.0034859071020036937,-0.0005986097385175526,-0.0004702720034401864,-0.0023520810063928367,-0.00009978315210901201,-0.0003041491436306387,-0.003698743646964431,-0.06340079754590988,-0.0034113232977688314,-0.00033312622690573335,-0.00041213183430954814,-0.0017900982638821006,-0.00021764023404102772,-0.0004938868223689497,-0.0001466381800128147,-0.0022674258798360826,-0.08003108203411102,-0.0024388942401856186,-0.00025841951719485223,-0.0005590689834207296,-0.00748544093221426,-0.00010121380910277367,-9.536747711536009e-7,-0.0005720701883547008,-0.001312699867412448,-0.049526430666446689,-0.002000842010602355,-0.00032489807927049696,-0.0005064100841991603,-0.008147197775542736,-0.0008537836838513613,-0.00040926961810328066,-0.0009865857427939773,-0.04962138459086418,-0.0016011294210329652,-0.0001255352544831112,-0.0002481649280525744,-0.0009321144898422062],"text_output":"3: \"颗粒剂\", 4: \"注射剂\", 5: \"口服散剂\", 6: \"滴丸剂\", 7: \"灌肠剂\", 8: \"�"}

data: {"context_logits":0.0,"cum_log_probs":-0.35588598251342776,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[-0.009281360544264317,-0.0006639180355705321,-0.0002992600784637034,-0.0067407069727778439,-0.000007033372639853042,-0.00038464312092401087,-0.0027929767966270448,-0.04418620467185974,-0.0034859071020036937,-0.0005986097385175526,-0.0004702720034401864,-0.0023520810063928367,-0.00009978315210901201,-0.0003041491436306387,-0.003698743646964431,-0.06340079754590988,-0.0034113232977688314,-0.00033312622690573335,-0.00041213183430954814,-0.0017900982638821006,-0.00021764023404102772,-0.0004938868223689497,-0.0001466381800128147,-0.0022674258798360826,-0.08003108203411102,-0.0024388942401856186,-0.00025841951719485223,-0.0005590689834207296,-0.00748544093221426,-0.00010121380910277367,-9.536747711536009e-7,-0.0005720701883547008,-0.001312699867412448,-0.049526430666446689,-0.002000842010602355,-0.00032489807927049696,-0.0005064100841991603,-0.008147197775542736,-0.0008537836838513613,-0.00040926961810328066,-0.0009865857427939773,-0.04962138459086418,-0.0016011294210329652,-0.0001255352544831112,-0.0002481649280525744,-0.0009321144898422062,-0.000003695494797284482],"text_output":"3: \"颗粒剂\", 4: \"注射剂\", 5: \"口服散剂\", 6: \"滴丸剂\", 7: \"灌肠剂\", 8: \"栓"}

data: {"context_logits":0.0,"cum_log_probs":-0.3562590479850769,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[-0.009281360544264317,-0.0006639180355705321,-0.0002992600784637034,-0.0067407069727778439,-0.000007033372639853042,-0.00038464312092401087,-0.0027929767966270448,-0.04418620467185974,-0.0034859071020036937,-0.0005986097385175526,-0.0004702720034401864,-0.0023520810063928367,-0.00009978315210901201,-0.0003041491436306387,-0.003698743646964431,-0.06340079754590988,-0.0034113232977688314,-0.00033312622690573335,-0.00041213183430954814,-0.0017900982638821006,-0.00021764023404102772,-0.0004938868223689497,-0.0001466381800128147,-0.0022674258798360826,-0.08003108203411102,-0.0024388942401856186,-0.00025841951719485223,-0.0005590689834207296,-0.00748544093221426,-0.00010121380910277367,-9.536747711536009e-7,-0.0005720701883547008,-0.001312699867412448,-0.049526430666446689,-0.002000842010602355,-0.00032489807927049696,-0.0005064100841991603,-0.008147197775542736,-0.0008537836838513613,-0.00040926961810328066,-0.0009865857427939773,-0.04962138459086418,-0.0016011294210329652,-0.0001255352544831112,-0.0002481649280525744,-0.0009321144898422062,-0.000003695494797284482,-0.0003730754542630166],"text_output":"3: \"颗粒剂\", 4: \"注射剂\", 5: \"口服散剂\", 6: \"滴丸剂\", 7: \"灌肠剂\", 8: \"栓剂"}

Do u have a better solution?

Thanks.

handoku commented 3 months ago

have you tried seeding request with stream=false? to confirm that its a tokenizer's decoding issue or accuracy issue.

@handoku Yes, if stream=false, the Chinese in the inference results will not be garbled.

But, I need to use streaming mode.

In fact, I tried accumulate_tokens mode, and the garbled character can be replaced, but in streaming mode, I have to give one normal word, not all the response every time.

data: {"context_logits":0.0,"cum_log_probs":-0.35588228702545168,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[-0.009281360544264317,-0.0006639180355705321,-0.0002992600784637034,-0.0067407069727778439,-0.000007033372639853042,-0.00038464312092401087,-0.0027929767966270448,-0.04418620467185974,-0.0034859071020036937,-0.0005986097385175526,-0.0004702720034401864,-0.0023520810063928367,-0.00009978315210901201,-0.0003041491436306387,-0.003698743646964431,-0.06340079754590988,-0.0034113232977688314,-0.00033312622690573335,-0.00041213183430954814,-0.0017900982638821006,-0.00021764023404102772,-0.0004938868223689497,-0.0001466381800128147,-0.0022674258798360826,-0.08003108203411102,-0.0024388942401856186,-0.00025841951719485223,-0.0005590689834207296,-0.00748544093221426,-0.00010121380910277367,-9.536747711536009e-7,-0.0005720701883547008,-0.001312699867412448,-0.049526430666446689,-0.002000842010602355,-0.00032489807927049696,-0.0005064100841991603,-0.008147197775542736,-0.0008537836838513613,-0.00040926961810328066,-0.0009865857427939773,-0.04962138459086418,-0.0016011294210329652,-0.0001255352544831112,-0.0002481649280525744,-0.0009321144898422062],"text_output":"3: \"颗粒剂\", 4: \"注射剂\", 5: \"口服散剂\", 6: \"滴丸剂\", 7: \"灌肠剂\", 8: \"�"}

data: {"context_logits":0.0,"cum_log_probs":-0.35588598251342776,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[-0.009281360544264317,-0.0006639180355705321,-0.0002992600784637034,-0.0067407069727778439,-0.000007033372639853042,-0.00038464312092401087,-0.0027929767966270448,-0.04418620467185974,-0.0034859071020036937,-0.0005986097385175526,-0.0004702720034401864,-0.0023520810063928367,-0.00009978315210901201,-0.0003041491436306387,-0.003698743646964431,-0.06340079754590988,-0.0034113232977688314,-0.00033312622690573335,-0.00041213183430954814,-0.0017900982638821006,-0.00021764023404102772,-0.0004938868223689497,-0.0001466381800128147,-0.0022674258798360826,-0.08003108203411102,-0.0024388942401856186,-0.00025841951719485223,-0.0005590689834207296,-0.00748544093221426,-0.00010121380910277367,-9.536747711536009e-7,-0.0005720701883547008,-0.001312699867412448,-0.049526430666446689,-0.002000842010602355,-0.00032489807927049696,-0.0005064100841991603,-0.008147197775542736,-0.0008537836838513613,-0.00040926961810328066,-0.0009865857427939773,-0.04962138459086418,-0.0016011294210329652,-0.0001255352544831112,-0.0002481649280525744,-0.0009321144898422062,-0.000003695494797284482],"text_output":"3: \"颗粒剂\", 4: \"注射剂\", 5: \"口服散剂\", 6: \"滴丸剂\", 7: \"灌肠剂\", 8: \"栓"}

data: {"context_logits":0.0,"cum_log_probs":-0.3562590479850769,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[-0.009281360544264317,-0.0006639180355705321,-0.0002992600784637034,-0.0067407069727778439,-0.000007033372639853042,-0.00038464312092401087,-0.0027929767966270448,-0.04418620467185974,-0.0034859071020036937,-0.0005986097385175526,-0.0004702720034401864,-0.0023520810063928367,-0.00009978315210901201,-0.0003041491436306387,-0.003698743646964431,-0.06340079754590988,-0.0034113232977688314,-0.00033312622690573335,-0.00041213183430954814,-0.0017900982638821006,-0.00021764023404102772,-0.0004938868223689497,-0.0001466381800128147,-0.0022674258798360826,-0.08003108203411102,-0.0024388942401856186,-0.00025841951719485223,-0.0005590689834207296,-0.00748544093221426,-0.00010121380910277367,-9.536747711536009e-7,-0.0005720701883547008,-0.001312699867412448,-0.049526430666446689,-0.002000842010602355,-0.00032489807927049696,-0.0005064100841991603,-0.008147197775542736,-0.0008537836838513613,-0.00040926961810328066,-0.0009865857427939773,-0.04962138459086418,-0.0016011294210329652,-0.0001255352544831112,-0.0002481649280525744,-0.0009321144898422062,-0.000003695494797284482,-0.0003730754542630166],"text_output":"3: \"颗粒剂\", 4: \"注射剂\", 5: \"口服散剂\", 6: \"滴丸剂\", 7: \"灌肠剂\", 8: \"栓剂"}

Do u have a better solution?

Thanks.

This is as expected. accumulate_tokens only ensures that final result is correct. As said before, its the limitation of the tokenizer.

A simple way is adding some dirty works on client side, like removing redundant word, discarding abnormal sentence.

Or you could add a check logic in tensorrt_llm_bls, if the decoded tokens is normal, then send response with one word; else do nothing. To do it you probably need to maintain the status of latest several decoded tokens of each request, not that easy as turning on accumulate_tokens, but please, look through the bls code, you can do it.

Or, waiting for others' convenient solution

activezhao commented 3 months ago

have you tried seeding request with stream=false? to confirm that its a tokenizer's decoding issue or accuracy issue.

@handoku Yes, if stream=false, the Chinese in the inference results will not be garbled.

But, I need to use streaming mode.

In fact, I tried accumulate_tokens mode, and the garbled character can be replaced, but in streaming mode, I have to give one normal word, not all the response every time.

data: {"context_logits":0.0,"cum_log_probs":-0.35588228702545168,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[-0.009281360544264317,-0.0006639180355705321,-0.0002992600784637034,-0.0067407069727778439,-0.000007033372639853042,-0.00038464312092401087,-0.0027929767966270448,-0.04418620467185974,-0.0034859071020036937,-0.0005986097385175526,-0.0004702720034401864,-0.0023520810063928367,-0.00009978315210901201,-0.0003041491436306387,-0.003698743646964431,-0.06340079754590988,-0.0034113232977688314,-0.00033312622690573335,-0.00041213183430954814,-0.0017900982638821006,-0.00021764023404102772,-0.0004938868223689497,-0.0001466381800128147,-0.0022674258798360826,-0.08003108203411102,-0.0024388942401856186,-0.00025841951719485223,-0.0005590689834207296,-0.00748544093221426,-0.00010121380910277367,-9.536747711536009e-7,-0.0005720701883547008,-0.001312699867412448,-0.049526430666446689,-0.002000842010602355,-0.00032489807927049696,-0.0005064100841991603,-0.008147197775542736,-0.0008537836838513613,-0.00040926961810328066,-0.0009865857427939773,-0.04962138459086418,-0.0016011294210329652,-0.0001255352544831112,-0.0002481649280525744,-0.0009321144898422062],"text_output":"3: \"颗粒剂\", 4: \"注射剂\", 5: \"口服散剂\", 6: \"滴丸剂\", 7: \"灌肠剂\", 8: \"�"}

data: {"context_logits":0.0,"cum_log_probs":-0.35588598251342776,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[-0.009281360544264317,-0.0006639180355705321,-0.0002992600784637034,-0.0067407069727778439,-0.000007033372639853042,-0.00038464312092401087,-0.0027929767966270448,-0.04418620467185974,-0.0034859071020036937,-0.0005986097385175526,-0.0004702720034401864,-0.0023520810063928367,-0.00009978315210901201,-0.0003041491436306387,-0.003698743646964431,-0.06340079754590988,-0.0034113232977688314,-0.00033312622690573335,-0.00041213183430954814,-0.0017900982638821006,-0.00021764023404102772,-0.0004938868223689497,-0.0001466381800128147,-0.0022674258798360826,-0.08003108203411102,-0.0024388942401856186,-0.00025841951719485223,-0.0005590689834207296,-0.00748544093221426,-0.00010121380910277367,-9.536747711536009e-7,-0.0005720701883547008,-0.001312699867412448,-0.049526430666446689,-0.002000842010602355,-0.00032489807927049696,-0.0005064100841991603,-0.008147197775542736,-0.0008537836838513613,-0.00040926961810328066,-0.0009865857427939773,-0.04962138459086418,-0.0016011294210329652,-0.0001255352544831112,-0.0002481649280525744,-0.0009321144898422062,-0.000003695494797284482],"text_output":"3: \"颗粒剂\", 4: \"注射剂\", 5: \"口服散剂\", 6: \"滴丸剂\", 7: \"灌肠剂\", 8: \"栓"}

data: {"context_logits":0.0,"cum_log_probs":-0.3562590479850769,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[-0.009281360544264317,-0.0006639180355705321,-0.0002992600784637034,-0.0067407069727778439,-0.000007033372639853042,-0.00038464312092401087,-0.0027929767966270448,-0.04418620467185974,-0.0034859071020036937,-0.0005986097385175526,-0.0004702720034401864,-0.0023520810063928367,-0.00009978315210901201,-0.0003041491436306387,-0.003698743646964431,-0.06340079754590988,-0.0034113232977688314,-0.00033312622690573335,-0.00041213183430954814,-0.0017900982638821006,-0.00021764023404102772,-0.0004938868223689497,-0.0001466381800128147,-0.0022674258798360826,-0.08003108203411102,-0.0024388942401856186,-0.00025841951719485223,-0.0005590689834207296,-0.00748544093221426,-0.00010121380910277367,-9.536747711536009e-7,-0.0005720701883547008,-0.001312699867412448,-0.049526430666446689,-0.002000842010602355,-0.00032489807927049696,-0.0005064100841991603,-0.008147197775542736,-0.0008537836838513613,-0.00040926961810328066,-0.0009865857427939773,-0.04962138459086418,-0.0016011294210329652,-0.0001255352544831112,-0.0002481649280525744,-0.0009321144898422062,-0.000003695494797284482,-0.0003730754542630166],"text_output":"3: \"颗粒剂\", 4: \"注射剂\", 5: \"口服散剂\", 6: \"滴丸剂\", 7: \"灌肠剂\", 8: \"栓剂"}

Do u have a better solution?

Thanks.

This is as expected. accumulate_tokens only ensures that final result is correct. As said before, its the limitation of the tokenizer.

A simple way is adding some dirty works on client side, like removing redundant word, discarding abnormal sentence.

Or you could add a check logic in tensorrt_llm_bls, if the decoded tokens is normal, then send response with one word; else do nothing. To do it you probably need to maintain the status of latest several decoded tokens of each request, not that easy as turning on accumulate_tokens, but please, look through the bls code, you can do it.

Or, waiting for others' convenient solution

@handoku Thanks for your reply.

I also think adding a sliding window during decoding may be a good way.

Eg, create a sliding window with length of 4, and accumulate the new token to the window. And we only decode the tokens within the window, then return the first character.

In this way, we can ensure that there are no garbled characters while also avoiding decoding all tokens every time.

What do you think?

I'm actually looking at the decode.py file in bls and trying to figure out the best way to handle this.

wxsms commented 3 months ago

To resolve the � char in accumlate_tokens: true mode, we can simply add errors='ignore' to tokenizer.decode in postprocessing script. This will strip all � outputs.

activezhao commented 3 months ago

To resolve the � char in accumlate_tokens: true mode, we can simply add errors='ignore' to tokenizer.decode in postprocessing script. This will strip all � outputs.

@wxsms So cool, could u please give more details? Such as the code?

Thanks.

wxsms commented 3 months ago

for example (in proporcessing/1/model.py):

    def _postprocessing(self, tokens_batch, sequence_lengths):
        outputs = []
        for batch_idx, beam_tokens in enumerate(tokens_batch):
            for beam_idx, tokens in enumerate(beam_tokens):
                seq_len = sequence_lengths[batch_idx][beam_idx]
                output = self.tokenizer.decode(
                    tokens[:seq_len],
                    skip_special_tokens=self.skip_special_tokens,
                    errors='ignore'
                )
                outputs.append(output.encode('utf8'))
        return outputs

activezhao commented 3 months ago

for example (in proporcessing/1/model.py):

    def _postprocessing(self, tokens_batch, sequence_lengths):
        outputs = []
        for batch_idx, beam_tokens in enumerate(tokens_batch):
            for beam_idx, tokens in enumerate(beam_tokens):
                seq_len = sequence_lengths[batch_idx][beam_idx]
                output = self.tokenizer.decode(
                    tokens[:seq_len],
                    skip_special_tokens=self.skip_special_tokens,
                    errors='ignore'
                )
                outputs.append(output.encode('utf8'))
        return outputs

@wxsms Got it, thanks.

But will the � character be retained? Or just skip it?

activezhao commented 2 months ago

Hi @handoku I found a problem when using accumulate_tokens.

When the prompt and parameters are the same, I use APIs of ensemble and tensorrt_llm_bls, the results are different.

curl -X POST localhost:8820/v2/models/tensorrt_llm_bls/generate_stream

curl -X POST localhost:8820/v2/models/tensorrt_llm_bls/generate_stream -d '{"text_input": "\u003creponame\u003ecommon\n\u003cneighbor\u003e\u003cfilename\u003evalue\u003ccodeblock\u003e// Compare this snippet from waitpush/DrugRemindPush.go:...\u003cneighbor\u003e\u003cfilename\u003ekey\u003ccodeblock\u003eDrugRemindPush.go\u003cfilename\u003edosage_form.go\n\u003c｜fim▁begin｜\u003e\u003creponame\u003eprogramming-language-demo\n\u003cneighbor\u003e\u003cfilename\u003eprime-number.go\u003ccodeblock\u003e// }\n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError()\n// func main()\n// func isPrime(n int) bool\n// Compare this snippet from go/prime-number.go:\n// package main\n// \n// import (\n//     \"fmt\"\n//     \"os\"\n//     \"strconv\"\n// )\n// \n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// \n// func exitWithError() {\n//     fmt.Println(\"Usage: please input a non-negative integer\")\n//     os.Exit(1)\n// }\n// \n// func main() {\n//     if len(os.Args) != 2 {\n//         exitWithError()\n//     }\n// \n//     n, err := strconv.Atoi(os.Args[1])\n//     if err != nil || n \u003c 0 {\n//         exitWithError()\n//     }\n// \n//     if isPrime(n) {\n//         fmt.Println(\"Prime\")\n//     } else {\n//         fmt.Println(\"Composite\")\n//     }\n// }\u003cneighbor\u003e\u003cfilename\u003eprime-number.go\u003ccodeblock\u003e// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError() {\n//     fmt.Println(\"Usage: please input a non-negative integer\")\n//     os.Exit(1)\n// }\n// func main() {\n//     if len(os.Args) != 2 {\n//         exitWithError()\n//     }\n// \n//     n, err := strconv.Atoi(os.Args[1])\n//     if err != nil || n \u003c 0 {\n//         exitWithError()\n//     }\n// \n//     if isPrime(n) {\n//         fmt.Println(\"Prime\")\n//     } else {\n//         fmt.Println(\"Composite\")\n//     }\n// }\n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError()\n// func main()\n// func isPrime(n int) bool\n// Compare this snippet from go/prime-number.go:\n// package main\n// \n// import (\n//     \"fmt\"\n//     \"os\"\n//     \"strconv\"\n// )\n// \n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// \n// func exitWithError() {\u003cneighbor\u003e\u003cfilename\u003elongest-word.go\u003ccodeblock\u003e// Variables from import file go/longest-word.go can be referenced:\n// errorMessage = \"Usage: please provide a string\"\n// Functions from import file go/longest-word.go can be referenced:\n// func longestWordLength(str string) int {\n//     words := strings.FieldsFunc(str, isLimitedWhitespace)\n//     return longestStringLength(words)\n// }\n// func isLimitedWhitespace(r rune) bool {\n//     return strings.ContainsRune(\" \\t\\n\\r\", r)\n// }\n// func longestStringLength(strs []string) (longest int) {\n//     for _, str := range strs {\n//         if len(str) \u003e longest {\n//             longest = len(str)\n//         }\n//     }\n//     return\n// }\n// Functions from import file go/longest-word.go can be referenced:\n// func longestWordLength(str string) int\n// func isLimitedWhitespace(r rune) bool\n// func longestStringLength(strs []string) (longest int)\u003cneighbor\u003e\u003cfilename\u003efactorial.go\u003ccodeblock\u003e// Functions from import file go/factorial.go can be referenced:\n// func exitWithError(msg string) {\n//     fmt.Println(msg)\n//     os.Exit(1)\n// }\n// func factorial(n uint64) uint64 {\n//     if n \u003c= 0 {\n//         return 1\n//     }\n//     return n * factorial(n-1)\n// }\n// Functions from import file go/factorial.go can be referenced:\n// func exitWithError(msg string)\n// func factorial(n uint64) uint64\u003cfilename\u003elongest-common-subsequence.go\n\u003ccodecontent\u003epackage main\nimport (\n    \"encoding/json\"\n    \"fmt\"\n    \"os\"\n    \"regexp\"\n    \"strconv\"\n    \"strings\"\n)\n//exitWithError\n\u003c｜fim▁end｜\u003e}\n\u003c｜fim▁hole｜\u003e", "max_tokens": 50, "bad_words": "", "stop_words": "", "stream": false, "temperature": 0.2, "top_p": 0.95, "return_log_probs": true, "generation_logits": true}'

The result is:

data: {"context_logits":0.0,"cum_log_probs":-77.98719787597656,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[-1.3984918594360352,-3.991654872894287,-2.127605676651001,-0.18318799138069154,-0.15039844810962678,-0.3713747262954712,-2.1666009426116945,-0.03320259973406792,-0.6704073548316956,-3.395005941390991,-6.215298652648926,-3.6144485473632814,-3.8179116249084474,-1.1550722122192383,-1.0524828433990479,-0.32207995653152468,-0.4670903980731964,-5.648696422576904,-3.6973865032196047,-3.8024346828460695,-0.13288161158561707,-3.7232208251953127,-2.065372943878174,-0.026736034080386163,-0.30800527334213259,-0.15478214621543885,-3.5880002975463869,-2.564371109008789,-1.118330717086792,-0.008484973572194577,-1.2587940692901612,-0.5912411212921143,-2.966789484024048,-2.6259653568267824,-0.009489176794886589,-0.018396474421024324,-0.12405481934547425,-2.876150131225586,-0.15892530977725984,-3.3690268993377687,-3.163250684738159,-1.4551129341125489,-0.021045353263616563,-0.0005316358874551952,-0.05893709510564804,-1.1418265104293824,-0.00010598267544992268,-0.03211848437786102,-0.10972829163074494,-0.03469150885939598],"text_output":"//findLCS\n//main\n//func removeWhiteSpace\n//func processCommandLineArgs\n//func main() {\n//    var (\n//        lcs       = findLCS(os.Args[1"}

The part of text_output is:

//findLCS
//main
//func removeWhiteSpace
//func processCommandLineArgs
//func main() {
//    var (
//        lcs       = findLCS(os.Args[1

curl -X POST localhost:8820/v2/models/ensemble/generate_stream

curl -X POST localhost:8820/v2/models/ensemble/generate_stream -d '{"text_input": "\u003creponame\u003ecommon\n\u003cneighbor\u003e\u003cfilename\u003evalue\u003ccodeblock\u003e// Compare this snippet from waitpush/DrugRemindPush.go:...\u003cneighbor\u003e\u003cfilename\u003ekey\u003ccodeblock\u003eDrugRemindPush.go\u003cfilename\u003edosage_form.go\n\u003c｜fim▁begin｜\u003e\u003creponame\u003eprogramming-language-demo\n\u003cneighbor\u003e\u003cfilename\u003eprime-number.go\u003ccodeblock\u003e// }\n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError()\n// func main()\n// func isPrime(n int) bool\n// Compare this snippet from go/prime-number.go:\n// package main\n// \n// import (\n//     \"fmt\"\n//     \"os\"\n//     \"strconv\"\n// )\n// \n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// \n// func exitWithError() {\n//     fmt.Println(\"Usage: please input a non-negative integer\")\n//     os.Exit(1)\n// }\n// \n// func main() {\n//     if len(os.Args) != 2 {\n//         exitWithError()\n//     }\n// \n//     n, err := strconv.Atoi(os.Args[1])\n//     if err != nil || n \u003c 0 {\n//         exitWithError()\n//     }\n// \n//     if isPrime(n) {\n//         fmt.Println(\"Prime\")\n//     } else {\n//         fmt.Println(\"Composite\")\n//     }\n// }\u003cneighbor\u003e\u003cfilename\u003eprime-number.go\u003ccodeblock\u003e// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError() {\n//     fmt.Println(\"Usage: please input a non-negative integer\")\n//     os.Exit(1)\n// }\n// func main() {\n//     if len(os.Args) != 2 {\n//         exitWithError()\n//     }\n// \n//     n, err := strconv.Atoi(os.Args[1])\n//     if err != nil || n \u003c 0 {\n//         exitWithError()\n//     }\n// \n//     if isPrime(n) {\n//         fmt.Println(\"Prime\")\n//     } else {\n//         fmt.Println(\"Composite\")\n//     }\n// }\n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError()\n// func main()\n// func isPrime(n int) bool\n// Compare this snippet from go/prime-number.go:\n// package main\n// \n// import (\n//     \"fmt\"\n//     \"os\"\n//     \"strconv\"\n// )\n// \n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// \n// func exitWithError() {\u003cneighbor\u003e\u003cfilename\u003elongest-word.go\u003ccodeblock\u003e// Variables from import file go/longest-word.go can be referenced:\n// errorMessage = \"Usage: please provide a string\"\n// Functions from import file go/longest-word.go can be referenced:\n// func longestWordLength(str string) int {\n//     words := strings.FieldsFunc(str, isLimitedWhitespace)\n//     return longestStringLength(words)\n// }\n// func isLimitedWhitespace(r rune) bool {\n//     return strings.ContainsRune(\" \\t\\n\\r\", r)\n// }\n// func longestStringLength(strs []string) (longest int) {\n//     for _, str := range strs {\n//         if len(str) \u003e longest {\n//             longest = len(str)\n//         }\n//     }\n//     return\n// }\n// Functions from import file go/longest-word.go can be referenced:\n// func longestWordLength(str string) int\n// func isLimitedWhitespace(r rune) bool\n// func longestStringLength(strs []string) (longest int)\u003cneighbor\u003e\u003cfilename\u003efactorial.go\u003ccodeblock\u003e// Functions from import file go/factorial.go can be referenced:\n// func exitWithError(msg string) {\n//     fmt.Println(msg)\n//     os.Exit(1)\n// }\n// func factorial(n uint64) uint64 {\n//     if n \u003c= 0 {\n//         return 1\n//     }\n//     return n * factorial(n-1)\n// }\n// Functions from import file go/factorial.go can be referenced:\n// func exitWithError(msg string)\n// func factorial(n uint64) uint64\u003cfilename\u003elongest-common-subsequence.go\n\u003ccodecontent\u003epackage main\nimport (\n    \"encoding/json\"\n    \"fmt\"\n    \"os\"\n    \"regexp\"\n    \"strconv\"\n    \"strings\"\n)\n//exitWithError\n\u003c｜fim▁end｜\u003e}\n\u003c｜fim▁hole｜\u003e", "max_tokens": 50, "bad_words": "", "stop_words": "", "stream": false, "temperature": 0.2, "top_p": 0.95, "return_log_probs": true, "generation_logits": true}'

The result is:

data: {"context_logits":0.0,"cum_log_probs":-77.98719787597656,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[-1.3984918594360352,-3.991654872894287,-2.127605676651001,-0.18318799138069154,-0.15039844810962678,-0.3713747262954712,-2.1666009426116945,-0.03320259973406792,-0.6704073548316956,-3.395005941390991,-6.215298652648926,-3.6144485473632814,-3.8179116249084474,-1.1550722122192383,-1.0524828433990479,-0.32207995653152468,-0.4670903980731964,-5.648696422576904,-3.6973865032196047,-3.8024346828460695,-0.13288161158561707,-3.7232208251953127,-2.065372943878174,-0.026736034080386163,-0.30800527334213259,-0.15478214621543885,-3.5880002975463869,-2.564371109008789,-1.118330717086792,-0.008484973572194577,-1.2587940692901612,-0.5912411212921143,-2.966789484024048,-2.6259653568267824,-0.009489176794886589,-0.018396474421024324,-0.12405481934547425,-2.876150131225586,-0.15892530977725984,-3.3690268993377687,-3.163250684738159,-1.4551129341125489,-0.021045353263616563,-0.0005316358874551952,-0.05893709510564804,-1.1418265104293824,-0.00010598267544992268,-0.03211848437786102,-0.10972829163074494,-0.03469150885939598],"text_output":"//findLCS\n//main\n//func removeWhiteSpace\n//func processCommandLineArgs\n//func main() {\n//    var (\n//        lcs       = findLCS(os.Args[1"}

The part of text_output is:

func exitWithError(msg string) {
    fmt.Println(msg)
    os.Exit(1)
}
//longestCommonSubsequence
func longestCommonSubsequence(a, b string) string {

In fact, the result of ensemble is expected.

I'm confused as to why this is happening, I think the results just should be the same.

Have you ever met this problem?

Thanks.

activezhao commented 2 months ago

Hi @handoku I found a problem when using accumulate_tokens.

When the prompt and parameters are the same, I use APIs of ensemble and tensorrt_llm_bls, the results are different.

curl -X POST localhost:8820/v2/models/tensorrt_llm_bls/generate_stream

curl -X POST localhost:8820/v2/models/tensorrt_llm_bls/generate_stream -d '{"text_input": "\u003creponame\u003ecommon\n\u003cneighbor\u003e\u003cfilename\u003evalue\u003ccodeblock\u003e// Compare this snippet from waitpush/DrugRemindPush.go:...\u003cneighbor\u003e\u003cfilename\u003ekey\u003ccodeblock\u003eDrugRemindPush.go\u003cfilename\u003edosage_form.go\n\u003c｜fim▁begin｜\u003e\u003creponame\u003eprogramming-language-demo\n\u003cneighbor\u003e\u003cfilename\u003eprime-number.go\u003ccodeblock\u003e// }\n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError()\n// func main()\n// func isPrime(n int) bool\n// Compare this snippet from go/prime-number.go:\n// package main\n// \n// import (\n//     \"fmt\"\n//     \"os\"\n//     \"strconv\"\n// )\n// \n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// \n// func exitWithError() {\n//     fmt.Println(\"Usage: please input a non-negative integer\")\n//     os.Exit(1)\n// }\n// \n// func main() {\n//     if len(os.Args) != 2 {\n//         exitWithError()\n//     }\n// \n//     n, err := strconv.Atoi(os.Args[1])\n//     if err != nil || n \u003c 0 {\n//         exitWithError()\n//     }\n// \n//     if isPrime(n) {\n//         fmt.Println(\"Prime\")\n//     } else {\n//         fmt.Println(\"Composite\")\n//     }\n// }\u003cneighbor\u003e\u003cfilename\u003eprime-number.go\u003ccodeblock\u003e// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError() {\n//     fmt.Println(\"Usage: please input a non-negative integer\")\n//     os.Exit(1)\n// }\n// func main() {\n//     if len(os.Args) != 2 {\n//         exitWithError()\n//     }\n// \n//     n, err := strconv.Atoi(os.Args[1])\n//     if err != nil || n \u003c 0 {\n//         exitWithError()\n//     }\n// \n//     if isPrime(n) {\n//         fmt.Println(\"Prime\")\n//     } else {\n//         fmt.Println(\"Composite\")\n//     }\n// }\n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError()\n// func main()\n// func isPrime(n int) bool\n// Compare this snippet from go/prime-number.go:\n// package main\n// \n// import (\n//     \"fmt\"\n//     \"os\"\n//     \"strconv\"\n// )\n// \n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// \n// func exitWithError() {\u003cneighbor\u003e\u003cfilename\u003elongest-word.go\u003ccodeblock\u003e// Variables from import file go/longest-word.go can be referenced:\n// errorMessage = \"Usage: please provide a string\"\n// Functions from import file go/longest-word.go can be referenced:\n// func longestWordLength(str string) int {\n//     words := strings.FieldsFunc(str, isLimitedWhitespace)\n//     return longestStringLength(words)\n// }\n// func isLimitedWhitespace(r rune) bool {\n//     return strings.ContainsRune(\" \\t\\n\\r\", r)\n// }\n// func longestStringLength(strs []string) (longest int) {\n//     for _, str := range strs {\n//         if len(str) \u003e longest {\n//             longest = len(str)\n//         }\n//     }\n//     return\n// }\n// Functions from import file go/longest-word.go can be referenced:\n// func longestWordLength(str string) int\n// func isLimitedWhitespace(r rune) bool\n// func longestStringLength(strs []string) (longest int)\u003cneighbor\u003e\u003cfilename\u003efactorial.go\u003ccodeblock\u003e// Functions from import file go/factorial.go can be referenced:\n// func exitWithError(msg string) {\n//     fmt.Println(msg)\n//     os.Exit(1)\n// }\n// func factorial(n uint64) uint64 {\n//     if n \u003c= 0 {\n//         return 1\n//     }\n//     return n * factorial(n-1)\n// }\n// Functions from import file go/factorial.go can be referenced:\n// func exitWithError(msg string)\n// func factorial(n uint64) uint64\u003cfilename\u003elongest-common-subsequence.go\n\u003ccodecontent\u003epackage main\nimport (\n    \"encoding/json\"\n    \"fmt\"\n    \"os\"\n    \"regexp\"\n    \"strconv\"\n    \"strings\"\n)\n//exitWithError\n\u003c｜fim▁end｜\u003e}\n\u003c｜fim▁hole｜\u003e", "max_tokens": 50, "bad_words": "", "stop_words": "", "stream": false, "temperature": 0.2, "top_p": 0.95, "return_log_probs": true, "generation_logits": true}'

The result is:

data: {"context_logits":0.0,"cum_log_probs":-77.98719787597656,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[-1.3984918594360352,-3.991654872894287,-2.127605676651001,-0.18318799138069154,-0.15039844810962678,-0.3713747262954712,-2.1666009426116945,-0.03320259973406792,-0.6704073548316956,-3.395005941390991,-6.215298652648926,-3.6144485473632814,-3.8179116249084474,-1.1550722122192383,-1.0524828433990479,-0.32207995653152468,-0.4670903980731964,-5.648696422576904,-3.6973865032196047,-3.8024346828460695,-0.13288161158561707,-3.7232208251953127,-2.065372943878174,-0.026736034080386163,-0.30800527334213259,-0.15478214621543885,-3.5880002975463869,-2.564371109008789,-1.118330717086792,-0.008484973572194577,-1.2587940692901612,-0.5912411212921143,-2.966789484024048,-2.6259653568267824,-0.009489176794886589,-0.018396474421024324,-0.12405481934547425,-2.876150131225586,-0.15892530977725984,-3.3690268993377687,-3.163250684738159,-1.4551129341125489,-0.021045353263616563,-0.0005316358874551952,-0.05893709510564804,-1.1418265104293824,-0.00010598267544992268,-0.03211848437786102,-0.10972829163074494,-0.03469150885939598],"text_output":"//findLCS\n//main\n//func removeWhiteSpace\n//func processCommandLineArgs\n//func main() {\n//    var (\n//        lcs       = findLCS(os.Args[1"}

The part of text_output is:

//findLCS
//main
//func removeWhiteSpace
//func processCommandLineArgs
//func main() {
//    var (
//        lcs       = findLCS(os.Args[1

curl -X POST localhost:8820/v2/models/ensemble/generate_stream

curl -X POST localhost:8820/v2/models/ensemble/generate_stream -d '{"text_input": "\u003creponame\u003ecommon\n\u003cneighbor\u003e\u003cfilename\u003evalue\u003ccodeblock\u003e// Compare this snippet from waitpush/DrugRemindPush.go:...\u003cneighbor\u003e\u003cfilename\u003ekey\u003ccodeblock\u003eDrugRemindPush.go\u003cfilename\u003edosage_form.go\n\u003c｜fim▁begin｜\u003e\u003creponame\u003eprogramming-language-demo\n\u003cneighbor\u003e\u003cfilename\u003eprime-number.go\u003ccodeblock\u003e// }\n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError()\n// func main()\n// func isPrime(n int) bool\n// Compare this snippet from go/prime-number.go:\n// package main\n// \n// import (\n//     \"fmt\"\n//     \"os\"\n//     \"strconv\"\n// )\n// \n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// \n// func exitWithError() {\n//     fmt.Println(\"Usage: please input a non-negative integer\")\n//     os.Exit(1)\n// }\n// \n// func main() {\n//     if len(os.Args) != 2 {\n//         exitWithError()\n//     }\n// \n//     n, err := strconv.Atoi(os.Args[1])\n//     if err != nil || n \u003c 0 {\n//         exitWithError()\n//     }\n// \n//     if isPrime(n) {\n//         fmt.Println(\"Prime\")\n//     } else {\n//         fmt.Println(\"Composite\")\n//     }\n// }\u003cneighbor\u003e\u003cfilename\u003eprime-number.go\u003ccodeblock\u003e// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError() {\n//     fmt.Println(\"Usage: please input a non-negative integer\")\n//     os.Exit(1)\n// }\n// func main() {\n//     if len(os.Args) != 2 {\n//         exitWithError()\n//     }\n// \n//     n, err := strconv.Atoi(os.Args[1])\n//     if err != nil || n \u003c 0 {\n//         exitWithError()\n//     }\n// \n//     if isPrime(n) {\n//         fmt.Println(\"Prime\")\n//     } else {\n//         fmt.Println(\"Composite\")\n//     }\n// }\n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// Functions from import file go/prime-number.go can be referenced:\n// func exitWithError()\n// func main()\n// func isPrime(n int) bool\n// Compare this snippet from go/prime-number.go:\n// package main\n// \n// import (\n//     \"fmt\"\n//     \"os\"\n//     \"strconv\"\n// )\n// \n// func isPrime(n int) bool {\n//     if n \u003c 2 {\n//         return false\n//     } else {\n//         for i := 2; i \u003c= n/2; i++ {\n//             if n%i == 0 {\n//                 return false\n//             }\n//         }\n//     }\n//     return true\n// }\n// \n// func exitWithError() {\u003cneighbor\u003e\u003cfilename\u003elongest-word.go\u003ccodeblock\u003e// Variables from import file go/longest-word.go can be referenced:\n// errorMessage = \"Usage: please provide a string\"\n// Functions from import file go/longest-word.go can be referenced:\n// func longestWordLength(str string) int {\n//     words := strings.FieldsFunc(str, isLimitedWhitespace)\n//     return longestStringLength(words)\n// }\n// func isLimitedWhitespace(r rune) bool {\n//     return strings.ContainsRune(\" \\t\\n\\r\", r)\n// }\n// func longestStringLength(strs []string) (longest int) {\n//     for _, str := range strs {\n//         if len(str) \u003e longest {\n//             longest = len(str)\n//         }\n//     }\n//     return\n// }\n// Functions from import file go/longest-word.go can be referenced:\n// func longestWordLength(str string) int\n// func isLimitedWhitespace(r rune) bool\n// func longestStringLength(strs []string) (longest int)\u003cneighbor\u003e\u003cfilename\u003efactorial.go\u003ccodeblock\u003e// Functions from import file go/factorial.go can be referenced:\n// func exitWithError(msg string) {\n//     fmt.Println(msg)\n//     os.Exit(1)\n// }\n// func factorial(n uint64) uint64 {\n//     if n \u003c= 0 {\n//         return 1\n//     }\n//     return n * factorial(n-1)\n// }\n// Functions from import file go/factorial.go can be referenced:\n// func exitWithError(msg string)\n// func factorial(n uint64) uint64\u003cfilename\u003elongest-common-subsequence.go\n\u003ccodecontent\u003epackage main\nimport (\n    \"encoding/json\"\n    \"fmt\"\n    \"os\"\n    \"regexp\"\n    \"strconv\"\n    \"strings\"\n)\n//exitWithError\n\u003c｜fim▁end｜\u003e}\n\u003c｜fim▁hole｜\u003e", "max_tokens": 50, "bad_words": "", "stop_words": "", "stream": false, "temperature": 0.2, "top_p": 0.95, "return_log_probs": true, "generation_logits": true}'

The result is:

data: {"context_logits":0.0,"cum_log_probs":-77.98719787597656,"generation_logits":0.0,"model_name":"tensorrt_llm_bls","model_version":"1","output_log_probs":[-1.3984918594360352,-3.991654872894287,-2.127605676651001,-0.18318799138069154,-0.15039844810962678,-0.3713747262954712,-2.1666009426116945,-0.03320259973406792,-0.6704073548316956,-3.395005941390991,-6.215298652648926,-3.6144485473632814,-3.8179116249084474,-1.1550722122192383,-1.0524828433990479,-0.32207995653152468,-0.4670903980731964,-5.648696422576904,-3.6973865032196047,-3.8024346828460695,-0.13288161158561707,-3.7232208251953127,-2.065372943878174,-0.026736034080386163,-0.30800527334213259,-0.15478214621543885,-3.5880002975463869,-2.564371109008789,-1.118330717086792,-0.008484973572194577,-1.2587940692901612,-0.5912411212921143,-2.966789484024048,-2.6259653568267824,-0.009489176794886589,-0.018396474421024324,-0.12405481934547425,-2.876150131225586,-0.15892530977725984,-3.3690268993377687,-3.163250684738159,-1.4551129341125489,-0.021045353263616563,-0.0005316358874551952,-0.05893709510564804,-1.1418265104293824,-0.00010598267544992268,-0.03211848437786102,-0.10972829163074494,-0.03469150885939598],"text_output":"//findLCS\n//main\n//func removeWhiteSpace\n//func processCommandLineArgs\n//func main() {\n//    var (\n//        lcs       = findLCS(os.Args[1"}

The part of text_output is:

func exitWithError(msg string) {
    fmt.Println(msg)
    os.Exit(1)
}
//longestCommonSubsequence
func longestCommonSubsequence(a, b string) string {

In fact, the result of ensemble is expected.

I'm confused as to why this is happening, I think the results just should be the same.

Have you ever met this problem?

Thanks.

@byshiue @Tracin Could u please see this problem?

Thanks.

triton-inference-server / tensorrtllm_backend