triton-inference-server tensorrtllm_backend issues

triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend

Apache License 2.0

711 stars 108 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Update TensorRT-LLM backend

#652 kaiyux closed 1 hour ago
0
triton streaming is not working as expected

#651 robosina opened 12 hours ago
0
[PoC] Improve TRTLLM deployment UX

#650 rmccorm4 opened 3 days ago
0
fix: Fix typo with 'false' and pre-commit failures

#649 rmccorm4 closed 3 days ago
1
Qwen2-VL support

#648 zrczrczrc opened 5 days ago
0
Update TensorRT-LLM backend

#647 kaiyux closed 6 days ago
0
Stub process 'whisper_bls_0_0' is not healthy.

#646 MrD005 opened 1 week ago
0
Update the multinode tutorial link

#644 harryskim opened 1 week ago
0
tensortllm backend fails when kv cache is disabled

#645 ShuaiShao93 opened 1 week ago
5
Update TensorRT-LLM backend

#643 kaiyux closed 2 weeks ago
0
With same engine, trtllm backend is 40x slower than TensorRT-LLM/examples/run.py

#642 ShuaiShao93 closed 1 week ago
1
tritonserver does not load Lora automatically

#641 Alireza3242 closed 1 week ago
1
problem with streaming

#640 Alireza3242 closed 1 week ago
1
Support non-detached mode for python trtllm backend

#639 ShuaiShao93 opened 2 weeks ago
0
Update TensorRT-LLM backend

#638 kaiyux closed 3 weeks ago
0
Update TensorRT-LLM backend v0.14.0

#637 kaiyux closed 3 weeks ago
0
Update TensorRT-LLM backend

#635 kaiyux closed 3 weeks ago
0
sequence_length output tensor does not reflect the position of end_id token.

#634 jxchenus closed 3 weeks ago
2
problem with output_log_probs

#632 Alireza3242 opened 4 weeks ago
3
Fix broken links in README.md

#631 benchislett opened 1 month ago
0
the output of bls is unstable

#630 dwq370 opened 1 month ago
0
Update TensorRT-LLM backend

#629 kaiyux closed 1 month ago
0
Update launch_triton_server.py

#628 ankur1-samsung opened 1 month ago
0
tensorrtllm_backend/scripts /launch_triton_server.py parse_arguments () typo correction

#627 ankur1-samsung closed 1 month ago
2
Streaming Inference Failure

#626 imilli opened 1 month ago
1
The GPU memory usage is too high.

#625 imilli opened 1 month ago
1
Garbage response when input tokens is longer than 4096 on Llama-3.1-8B-Instruct

#624 winstxnhdw opened 1 month ago
2
Failed install in nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3

#623 wwx007121 opened 1 month ago
0
Performance Issue with inflight_batcher_llm Model in v0.13.0

#622 junstar92 opened 1 month ago
1
A bug in sending an inference request using the tensorrt_llm_bls model

#621 Noblezhong closed 1 month ago
1
Update TensorRT-LLM backend

#620 kaiyux closed 1 month ago
0
Throw ZeroDivisionError when benchmark

#619 moyerlee closed 1 month ago
0
unable to load shared library: /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm_common.so: undefined symbol: _ZNK12tensorrt_llm8executor8Response11getErrorMsgB5cxx11Ev;

#618 wwx007121 closed 1 month ago
3
Stark Difference in GPU Usage of Triton Servers with Llama3 and Llama3.1 models

#636 jasonngap1 opened 1 month ago
1
make 2 instance.

#617 Alireza3242 opened 1 month ago
0
fill_template.py and gpu_device_ids

#616 Alireza3242 opened 1 month ago
0
Support dynamic path for gpt_model_path and token_dir based on Triton model repo

#615 rahchuenmonroe opened 1 month ago
0
Result is out of order when using http stream mode

#614 liu21yd opened 1 month ago
4
An error that `Shape does not match true shape of 'data' field` occurs when using tensorrt_llm model alone in inflight_batcher_llm

#613 junstar92 closed 2 weeks ago
1
support for whisper trt-llm engine triton deployment

#612 haiderasad opened 1 month ago
1
Update TensorRT-LLM backend

#611 kaiyux closed 1 month ago
0
Is ReDrafter supported by the TensorRT-LLM backend?

#610 vkc1vk opened 1 month ago
2
Dynamic batching not working

#609 ShuaiShao93 closed 1 month ago
1
Update TensorRT-LLM backend

#608 DanBlanaru closed 1 month ago
0
TensorRT-LLM backend v0.13 Update

#607 Shixiaowei02 closed 1 month ago
0
Is it possible to edit backend within config.pbtxt from python backed to tensorrtllm backen, Whisper model ?

#605 rungrodkspeed closed 3 weeks ago
1
Update llama.md

#604 surprisedPikachu007 opened 2 months ago
0
Bad quality in answers (repetition, non stop...) when using Llama3.1-8B-Instruct and Triton

#603 alvaroalfaro612 opened 2 months ago
1
Update TensorRT-LLM backend

#602 kaiyux closed 2 months ago
0
Qwen2-14B generate_stream return some garbled code

#606 kazyun opened 2 months ago
4