triton-inference-server tensorrtllm_backend issues

triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend

Apache License 2.0

581 stars 81 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[request] Add example of custom LLM model not based on huggingface

#465 michaelnny closed 1 month ago
0
[Bug] Zero temperature curl request affects non-zero temperature requests

#464 Hao-YunDeng closed 4 days ago
5
Can you provide an example of a visual language model or multimodal model launch by triton server?

#463 lzcchl opened 1 month ago
6
How to deploy one model instance across multiple GPUs to tackle the OOM problem?

#462 shil3754 opened 1 month ago
6
decoding_mode top_k_top_p does not take effect for llama2 not same with huggingface

#461 yjjiang11 opened 1 month ago
3
Implement XC-Cache to improve long context inference performance

#460 avianion opened 1 month ago
1
Tritonserver won't start up running Smaug 34b

#459 workuser12345 opened 1 month ago
1
two seemingly identical functions in the same file

#458 dongluw opened 1 month ago
1
Mixtral 8x7-v0.1 Hangs after serving a few requests

#457 aaditya-srivathsan closed 3 days ago
6
OOM Running Gemma 7B triton runtime engine

#456 workuser12345 closed 1 month ago
1
Update TensorRT-LLM backend

#454 kaiyux closed 1 month ago
0
tensorrt-llm serving performance is extreamly low for llama3-8b

#453 RunningLeon closed 1 month ago
3
Replace subprocess.Popen with subprocess.run

#452 rlempka opened 1 month ago
0
what is the matching version of the triton-trtllm (trtlllm v0.9.0) image in NGC ?

#451 tongjinle123 closed 1 month ago
0
TllmXqaJit runtime error when build Yi-6B fp8 with TRTLLM-0.10.0.dev2024050700

#450 kimbaol closed 1 month ago
0
[tensorrt-llm backend] A question about launch_triton_server.py

#455 victorsoda opened 1 month ago
2
FIX link reference in README.md

#449 sunjiabin17 closed 5 days ago
1
Example `gpu_device_ids` for multi-model usage?

#448 vnkc1 opened 1 month ago
1
[MINOR] Fix typo in README

#447 kooyunmo closed 2 weeks ago
1
make add_special_tokens/skip_special_tokens default value is true which align with hf setting

#446 XiaobingSuper closed 1 month ago
2
There is a problem with llama 7B model pre-processing after using triton server

#445 Graham1025 closed 1 month ago
5
Update TensorRT-LLM backend

#444 kaiyux closed 1 month ago
0
[BUG] coredump when process exit triggered after TRITONSERVER_ServerDelete

#443 hzlushiliang closed 1 month ago
1
InFlightBatching seems not working

#442 larme opened 1 month ago
3
Fix batch manager stats link

#441 rmccorm4 closed 3 weeks ago
0
Deployement failed for BERT

#440 vivekjoshi556 opened 1 month ago
1
Update TensorRT-LLM backend

#439 kaiyux closed 2 months ago
0
Deploying Mixtral-8x7B-v0.1 with Triton 24.02 on A100 (160GB) raises "Cuda Runtime (out of memory)" exception

#438 kelkarn opened 2 months ago
2
GptManager’s scalability issues with input & output parameters

#437 service-kit opened 2 months ago
1
How to post sample parameters (like top_k, temperature) for triton http server

#436 wanzhenchn closed 1 month ago
5
Encountered an error in forward function: std::bad_cast

#435 wangqy1216 opened 2 months ago
1
LLama 7B model can't get longer ouput text after using triton server

#434 XiaobingSuper closed 1 month ago
0
only python grpc client can cancle request, when will support with golang grpc

#433 jiuweisu closed 2 months ago
1
add speculative decoding example

#432 XiaobingSuper opened 2 months ago
3
Update TensorRT-LLM backend

#431 kaiyux closed 2 months ago
0
When to expect NGC container for v0.9.0 like 24.0x-trtllm-python-py3

#430 ekarmazin closed 1 month ago
1
`max_batch_size` seems to have no impact on model performance

#429 VitalyPetrov opened 2 months ago
8
Performance Issue with return_context_logits Enabled in TensorRT-LLM

#428 gywlssww opened 2 months ago
0
Update TensorRT-LLM backend

#427 kaiyux closed 2 months ago
0
Error when trying to deploy tuned LoRA LLM models to production

#426 frankh077 closed 2 months ago
1
Seg fault after loaded models in official example

#425 LeatherDeerAU opened 2 months ago
2
Can't launch triton server following docs, expecting [TensorRT] library version 9.2.0.5 got 9.3.0.1

#424 conway-abacus opened 2 months ago
5
Fixed Whitespace Error in Streaming mode

#423 enochlev opened 2 months ago
0
How is GptManager used in Triton backend?

#421 ekagra-ranjan opened 2 months ago
1
How does Triton know which codepath to choose based on `backend` in config.pbtxt being "tensorrtllm" or "python"

#420 ekagra-ranjan closed 2 months ago
1
Performance Issue with return_context_logits Enabled in TensorRT-LLM

#419 metterian opened 2 months ago
1
Filtering beam_search output tensors results in a string output vs list

#418 nikhilshandilya opened 2 months ago
1
Warmup Example of loading LoRa weights

#417 TheCodeWrangler opened 2 months ago
6
Using inflight decoding for `tensorrt_llm_bls` mode

#416 XiaobingSuper closed 2 months ago
2
The crash occurred when attempting to quantize the LLaMA model with W4A(fp)8_AWQ.

#415 pandengyao closed 2 months ago
4

Previous Next