issues
search
triton-inference-server
/
tensorrtllm_backend
The Triton TensorRT-LLM Backend
Apache License 2.0
581
stars
81
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
The crash occurred when attempting to quantize the LLaMA model with W4A(fp)8_AWQ.
#415
pandengyao
closed
2 months ago
4
the result use inflight_batcher_llm_client to send multiple lora weights is not same as use tensorrtllm
#413
stifles
opened
2 months ago
3
Feature Request: Set maximum number of in flight
#412
TheCodeWrangler
opened
2 months ago
0
Block reuse is currently not supported with beam width > 1
#411
tonylek
opened
2 months ago
3
Update TensorRT-LLM backend
#410
kaiyux
closed
2 months ago
0
Update end_to_end_test.py
#409
r0cketdyne
opened
2 months ago
0
dynamic batching not working properly with tensorrtllm_backend
#422
gavinzb
opened
2 months ago
3
Supporting beam search in streaming mode
#408
tonylek
opened
2 months ago
0
Update TensorRT-LLM backend
#407
kaiyux
closed
2 months ago
0
lora_task_id, lora_weights, lora_config not found in all_models/inflight_batcher_llm/tensorrt_llm_bls/1/lib/decode.py
#406
liao217
opened
2 months ago
0
Limited batched streaming when using inflight batching
#404
vnkc1
closed
2 months ago
3
Support bfloat16 LoRa Adaptors
#403
TheCodeWrangler
opened
2 months ago
5
Feature request: support 'max_input_len' and 'max_num_tokens' in config.pbtxt of tensorrtllm
#402
Saigut
closed
2 months ago
4
the tensorrtllm backends and onnxruntime backends
#401
tricky61
closed
2 months ago
2
Example of LoRa weights
#399
TheCodeWrangler
opened
2 months ago
2
There is no option to set world_size in config file in model repository
#398
Saigut
closed
2 months ago
4
Update TensorRT-LLM backend
#397
kaiyux
closed
2 months ago
0
Can't launch tensorrtllm_backend triton server in 24.02-trtllm-python-py3
#396
Saigut
closed
2 months ago
2
[Documentation improvement] Improve README for tensorrtllm_backend - v0.8.0
#395
kelkarn
opened
2 months ago
0
Build via Docker is much big than the image in NGC
#394
ZJU-lishuang
opened
2 months ago
5
Question to model not found
#393
geraldstanje
opened
2 months ago
1
Crash with high request concurrency
#392
silverriver
opened
2 months ago
3
Can tensorrtllm backend support LogitsProcessor?
#391
Muxv
opened
2 months ago
1
TensorRT-LLM often hangs using both `tp_size 2` and `enable_context_fmha`.
#390
lkm2835
opened
2 months ago
2
Confusion about versions and NGC images
#389
mbahri
opened
2 months ago
4
SAFETENSORS and OpenAI style endpoint
#388
RonanKMcGovern
opened
3 months ago
3
Update TensorRT-LLM backend
#387
kaiyux
closed
3 months ago
0
RemoteDisconnected('Remote end closed connection without response')
#386
trillionmonster
opened
3 months ago
1
Can't build docker image with Ryzen 5950x
#385
mallorbc
opened
3 months ago
2
Update TensorRT-LLM backend
#384
Shixiaowei02
closed
3 months ago
0
[Question] How to know the inference has been finished with generate_stream API?
#383
activezhao
opened
3 months ago
5
Right vs. left sided padding in in TensorRT-LLM backend examples
#382
JohnGiorgi
closed
2 months ago
2
Crashes for long context requests
#381
Pernekhan
opened
3 months ago
15
Update TensorRT-LLM backend
#380
kaiyux
closed
3 months ago
0
Deployment of TensorRT-LLM Model on Triton Server
#379
jasonngap1
closed
3 months ago
2
Verify if inflight batching is running
#378
bprus
closed
4 weeks ago
3
Triton Server Hangs when loading multi-GPU LLAMA2 Engine
#377
mindhash
opened
3 months ago
1
convert_checkpoint.py not working - safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization
#374
saurabhbhagwat
opened
3 months ago
3
Update TensorRT-LLM backend
#373
kaiyux
closed
3 months ago
0
Invalid argument: model input cannot have empty reshape for non-batching model as scalar tensors are not supported for tensorrt_llm
#372
mse700
closed
3 months ago
0
tritonserver crash (SIGNAL 11) when Opentelemetry trace is enable for trtllm backend
#371
npuichigo
closed
2 months ago
3
Getting gemmPlugin error for mixtral model on v0.8.0 while hosting on triton server
#370
sarthak-phatate
opened
3 months ago
1
Infer failed: Unable to parse 'data': Shape does not match true shape of 'data' field in generate endpoint
#369
bprus
opened
3 months ago
0
Using Bert/Roberta with "tensorrtllm" backend directly ? (no Python lib like tensorrt-llm package)
#368
pommedeterresautee
opened
3 months ago
5
sreaming mode doesn't work
#367
dongteng
opened
3 months ago
2
Memory avalable for KV using Triton TRT-LLM backed is lower than using TRT-LLM directly
#366
UnyieldingOrca
opened
3 months ago
3
Typo in README decoupled mode: Make text consistent for boolean variable in README.
#365
esnvidia
opened
3 months ago
1
[BUG] Missing `tokenizer_type` parameter to config.pbtxt
#364
esnvidia
opened
3 months ago
1
CUDA runtime error in cudaDeviceGetDefaultMemPool
#363
tobernat
opened
3 months ago
7
Update TensorRT-LLM backend
#362
kaiyux
closed
3 months ago
0
Previous
Next