issues
search
triton-inference-server
/
tensorrtllm_backend
The Triton TensorRT-LLM Backend
Apache License 2.0
711
stars
108
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update TensorRT-LLM backend
#652
kaiyux
closed
1 hour ago
0
triton streaming is not working as expected
#651
robosina
opened
12 hours ago
0
[PoC] Improve TRTLLM deployment UX
#650
rmccorm4
opened
3 days ago
0
fix: Fix typo with 'false' and pre-commit failures
#649
rmccorm4
closed
3 days ago
1
Qwen2-VL support
#648
zrczrczrc
opened
5 days ago
0
Update TensorRT-LLM backend
#647
kaiyux
closed
6 days ago
0
Stub process 'whisper_bls_0_0' is not healthy.
#646
MrD005
opened
1 week ago
0
Update the multinode tutorial link
#644
harryskim
opened
1 week ago
0
tensortllm backend fails when kv cache is disabled
#645
ShuaiShao93
opened
1 week ago
5
Update TensorRT-LLM backend
#643
kaiyux
closed
2 weeks ago
0
With same engine, trtllm backend is 40x slower than TensorRT-LLM/examples/run.py
#642
ShuaiShao93
closed
1 week ago
1
tritonserver does not load Lora automatically
#641
Alireza3242
closed
1 week ago
1
problem with streaming
#640
Alireza3242
closed
1 week ago
1
Support non-detached mode for python trtllm backend
#639
ShuaiShao93
opened
2 weeks ago
0
Update TensorRT-LLM backend
#638
kaiyux
closed
3 weeks ago
0
Update TensorRT-LLM backend v0.14.0
#637
kaiyux
closed
3 weeks ago
0
Update TensorRT-LLM backend
#635
kaiyux
closed
3 weeks ago
0
sequence_length output tensor does not reflect the position of end_id token.
#634
jxchenus
closed
3 weeks ago
2
problem with output_log_probs
#632
Alireza3242
opened
4 weeks ago
3
Fix broken links in README.md
#631
benchislett
opened
1 month ago
0
the output of bls is unstable
#630
dwq370
opened
1 month ago
0
Update TensorRT-LLM backend
#629
kaiyux
closed
1 month ago
0
Update launch_triton_server.py
#628
ankur1-samsung
opened
1 month ago
0
tensorrtllm_backend/scripts /launch_triton_server.py parse_arguments () typo correction
#627
ankur1-samsung
closed
1 month ago
2
Streaming Inference Failure
#626
imilli
opened
1 month ago
1
The GPU memory usage is too high.
#625
imilli
opened
1 month ago
1
Garbage response when input tokens is longer than 4096 on Llama-3.1-8B-Instruct
#624
winstxnhdw
opened
1 month ago
2
Failed install in nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3
#623
wwx007121
opened
1 month ago
0
Performance Issue with inflight_batcher_llm Model in v0.13.0
#622
junstar92
opened
1 month ago
1
A bug in sending an inference request using the tensorrt_llm_bls model
#621
Noblezhong
closed
1 month ago
1
Update TensorRT-LLM backend
#620
kaiyux
closed
1 month ago
0
Throw ZeroDivisionError when benchmark
#619
moyerlee
closed
1 month ago
0
unable to load shared library: /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm_common.so: undefined symbol: _ZNK12tensorrt_llm8executor8Response11getErrorMsgB5cxx11Ev;
#618
wwx007121
closed
1 month ago
3
Stark Difference in GPU Usage of Triton Servers with Llama3 and Llama3.1 models
#636
jasonngap1
opened
1 month ago
1
make 2 instance.
#617
Alireza3242
opened
1 month ago
0
fill_template.py and gpu_device_ids
#616
Alireza3242
opened
1 month ago
0
Support dynamic path for gpt_model_path and token_dir based on Triton model repo
#615
rahchuenmonroe
opened
1 month ago
0
Result is out of order when using http stream mode
#614
liu21yd
opened
1 month ago
4
An error that `Shape does not match true shape of 'data' field` occurs when using tensorrt_llm model alone in inflight_batcher_llm
#613
junstar92
closed
2 weeks ago
1
support for whisper trt-llm engine triton deployment
#612
haiderasad
opened
1 month ago
1
Update TensorRT-LLM backend
#611
kaiyux
closed
1 month ago
0
Is ReDrafter supported by the TensorRT-LLM backend?
#610
vkc1vk
opened
1 month ago
2
Dynamic batching not working
#609
ShuaiShao93
closed
1 month ago
1
Update TensorRT-LLM backend
#608
DanBlanaru
closed
1 month ago
0
TensorRT-LLM backend v0.13 Update
#607
Shixiaowei02
closed
1 month ago
0
Is it possible to edit backend within config.pbtxt from python backed to tensorrtllm backen, Whisper model ?
#605
rungrodkspeed
closed
3 weeks ago
1
Update llama.md
#604
surprisedPikachu007
opened
2 months ago
0
Bad quality in answers (repetition, non stop...) when using Llama3.1-8B-Instruct and Triton
#603
alvaroalfaro612
opened
2 months ago
1
Update TensorRT-LLM backend
#602
kaiyux
closed
2 months ago
0
Qwen2-14B generate_stream return some garbled code
#606
kazyun
opened
2 months ago
4
Next