issues
search
triton-inference-server
/
tensorrtllm_backend
The Triton TensorRT-LLM Backend
Apache License 2.0
581
stars
81
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[request] Add example of custom LLM model not based on huggingface
#465
michaelnny
closed
1 month ago
0
[Bug] Zero temperature curl request affects non-zero temperature requests
#464
Hao-YunDeng
closed
4 days ago
5
Can you provide an example of a visual language model or multimodal model launch by triton server?
#463
lzcchl
opened
1 month ago
6
How to deploy one model instance across multiple GPUs to tackle the OOM problem?
#462
shil3754
opened
1 month ago
6
decoding_mode top_k_top_p does not take effect for llama2 not same with huggingface
#461
yjjiang11
opened
1 month ago
3
Implement XC-Cache to improve long context inference performance
#460
avianion
opened
1 month ago
1
Tritonserver won't start up running Smaug 34b
#459
workuser12345
opened
1 month ago
1
two seemingly identical functions in the same file
#458
dongluw
opened
1 month ago
1
Mixtral 8x7-v0.1 Hangs after serving a few requests
#457
aaditya-srivathsan
closed
3 days ago
6
OOM Running Gemma 7B triton runtime engine
#456
workuser12345
closed
1 month ago
1
Update TensorRT-LLM backend
#454
kaiyux
closed
1 month ago
0
tensorrt-llm serving performance is extreamly low for llama3-8b
#453
RunningLeon
closed
1 month ago
3
Replace subprocess.Popen with subprocess.run
#452
rlempka
opened
1 month ago
0
what is the matching version of the triton-trtllm (trtlllm v0.9.0) image in NGC ?
#451
tongjinle123
closed
1 month ago
0
TllmXqaJit runtime error when build Yi-6B fp8 with TRTLLM-0.10.0.dev2024050700
#450
kimbaol
closed
1 month ago
0
[tensorrt-llm backend] A question about launch_triton_server.py
#455
victorsoda
opened
1 month ago
2
FIX link reference in README.md
#449
sunjiabin17
closed
5 days ago
1
Example `gpu_device_ids` for multi-model usage?
#448
vnkc1
opened
1 month ago
1
[MINOR] Fix typo in README
#447
kooyunmo
closed
2 weeks ago
1
make add_special_tokens/skip_special_tokens default value is true which align with hf setting
#446
XiaobingSuper
closed
1 month ago
2
There is a problem with llama 7B model pre-processing after using triton server
#445
Graham1025
closed
1 month ago
5
Update TensorRT-LLM backend
#444
kaiyux
closed
1 month ago
0
[BUG] coredump when process exit triggered after TRITONSERVER_ServerDelete
#443
hzlushiliang
closed
1 month ago
1
InFlightBatching seems not working
#442
larme
opened
1 month ago
3
Fix batch manager stats link
#441
rmccorm4
closed
3 weeks ago
0
Deployement failed for BERT
#440
vivekjoshi556
opened
1 month ago
1
Update TensorRT-LLM backend
#439
kaiyux
closed
2 months ago
0
Deploying Mixtral-8x7B-v0.1 with Triton 24.02 on A100 (160GB) raises "Cuda Runtime (out of memory)" exception
#438
kelkarn
opened
2 months ago
2
GptManager’s scalability issues with input & output parameters
#437
service-kit
opened
2 months ago
1
How to post sample parameters (like top_k, temperature) for triton http server
#436
wanzhenchn
closed
1 month ago
5
Encountered an error in forward function: std::bad_cast
#435
wangqy1216
opened
2 months ago
1
LLama 7B model can't get longer ouput text after using triton server
#434
XiaobingSuper
closed
1 month ago
0
only python grpc client can cancle request, when will support with golang grpc
#433
jiuweisu
closed
2 months ago
1
add speculative decoding example
#432
XiaobingSuper
opened
2 months ago
3
Update TensorRT-LLM backend
#431
kaiyux
closed
2 months ago
0
When to expect NGC container for v0.9.0 like 24.0x-trtllm-python-py3
#430
ekarmazin
closed
1 month ago
1
`max_batch_size` seems to have no impact on model performance
#429
VitalyPetrov
opened
2 months ago
8
Performance Issue with return_context_logits Enabled in TensorRT-LLM
#428
gywlssww
opened
2 months ago
0
Update TensorRT-LLM backend
#427
kaiyux
closed
2 months ago
0
Error when trying to deploy tuned LoRA LLM models to production
#426
frankh077
closed
2 months ago
1
Seg fault after loaded models in official example
#425
LeatherDeerAU
opened
2 months ago
2
Can't launch triton server following docs, expecting [TensorRT] library version 9.2.0.5 got 9.3.0.1
#424
conway-abacus
opened
2 months ago
5
Fixed Whitespace Error in Streaming mode
#423
enochlev
opened
2 months ago
0
How is GptManager used in Triton backend?
#421
ekagra-ranjan
opened
2 months ago
1
How does Triton know which codepath to choose based on `backend` in config.pbtxt being "tensorrtllm" or "python"
#420
ekagra-ranjan
closed
2 months ago
1
Performance Issue with return_context_logits Enabled in TensorRT-LLM
#419
metterian
opened
2 months ago
1
Filtering beam_search output tensors results in a string output vs list
#418
nikhilshandilya
opened
2 months ago
1
Warmup Example of loading LoRa weights
#417
TheCodeWrangler
opened
2 months ago
6
Using inflight decoding for `tensorrt_llm_bls` mode
#416
XiaobingSuper
closed
2 months ago
2
The crash occurred when attempting to quantize the LLaMA model with W4A(fp)8_AWQ.
#415
pandengyao
closed
2 months ago
4
Previous
Next