triton-inference-server server issues

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html

BSD 3-Clause "New" or "Revised" License

8.37k stars 1.49k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

How to free multiple gpu memory

#7825 1120475708 opened 5 hours ago
1
Suggestion on optimizing inference when model output size is large.

#7824 zmy1116 opened 13 hours ago
0
Unknown TensorRT-LLM model endpoint when using --model-namespacing=true

#7823 MatteoPagliani opened 20 hours ago
0
Lock grpcio version

#7822 mc-nv closed 17 hours ago
0
ci: modifying stat count for `L0_server_status` (#7820)

#7821 mc-nv closed 21 hours ago
0
ci: modifying stat count for `L0_server_status`

#7820 KrishnanPrash closed 1 day ago
0
fix: Default max tokens to None for OpenAI frontend.

#7819 thealmightygrant opened 2 days ago
0
Triton Server Utilizes Only One GPU Despite Two GPUs Available on Node

#7818 jmarchel7bulls opened 2 days ago
0
Can triton server support trace_id generator config?

#7817 stknight43 opened 2 days ago
0
test: Increase measurement-interval

#7816 krishung5 closed 2 days ago
0
fix: Fix L0_input_validation (#7800)

#7814 krishung5 closed 2 days ago
0
Model Analyzer Fails to Connect to Triton Server ([StatusCode.UNAVAILABLE] failed to connect to all addresses)

#7813 goudemaoningsir opened 3 days ago
0
build: Support RHEL ORT TensorRT Execution Provider

#7812 fpetrini15 closed 2 days ago
0
build: Update OpenVINO model generation version

#7811 yinggeh opened 3 days ago
0
failed to allocate pinned system memory: no pinned memory pool, falling back to non-pinned system memory

#7809 IceHowe closed 3 days ago
1
Removing maven installation as it causes side package installation

#7808 mc-nv closed 3 days ago
0
docs: Re-structure User Guides for Discoverability

#7807 statiraju opened 5 days ago
0
Change torch versions

#7806 mc-nv closed 3 days ago
0
test: Fix L0_model_update test

#7805 krishung5 closed 6 days ago
1
InferenceResponse error code is lost in Python BLS

#7804 ShuaiShao93 opened 6 days ago
1
test: Follow up PR for L0_dyna_implicit_state. Fix error message for L0_response_cache test

#7803 krishung5 closed 6 days ago
1
fix: Fix L0_onnx_execution_provider

#7802 yinggeh closed 6 days ago
0
test: Fix L0_dyna_implicit_state--base

#7801 krishung5 closed 6 days ago
1
fix: Fix L0_input_validation

#7800 pskiran1 closed 3 days ago
0
Error about driver version compatibility

#7798 GLW1215 opened 1 week ago
2
Update model generation scenario (#7793)

#7797 mc-nv closed 1 week ago
0
Problems with the response of the OpenAI-Compatible Frontend for Triton Inference Server

#7796 DimadonDL opened 1 week ago
4
Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate)

#7795 nicomeg-pr opened 1 week ago
5
ensemble multi-GPU

#7794 xiazi-yu opened 1 week ago
2
Update model generation scenario

#7793 mc-nv closed 1 week ago
0
有人遇到过yolov8n.pt模型转torchscripts和onnx，在triton server或Deepytorch Inference上推理，精度下降的问题吗？

#7792 JackonLiu opened 1 week ago
0
test: Fix tests for ubuntu 24.04. upgrade

#7791 krishung5 closed 1 week ago
0
test: Fix L0_backend_python for Ubuntu 24.04 base

#7789 kthui closed 1 week ago
0
test: RHEL Filesystem Tests

#7788 fpetrini15 closed 1 week ago
0
fix: Resolve integer overflow in Load API file decoding

#7787 pskiran1 closed 9 hours ago
0
Triton x vLLM backend GPU selection issue

#7786 Tedyang2003 opened 1 week ago
2
Update ONNX version for generated models

#7785 mc-nv closed 1 week ago
0
tritonserver is 40x slower than `TensorRT-LLM/examples/run.py`

#7784 ShuaiShao93 closed 1 week ago
1
Enable support for Ubuntu 24.04

#7783 mc-nv closed 1 week ago
0
Update README banner

#7781 mc-nv closed 1 week ago
0
Update README and versions for 2.52.0 / 24.11

#7780 mc-nv closed 1 week ago
0
test: OpenAI frontend invalid chat tokenizer network issue WAR

#7779 kthui closed 1 week ago
1
Constrained Decoding with Python backend and BLS

#7778 MatteoPagliani closed 21 hours ago
4
Example of using Ragged Batching with FasterTransformer / TRT-LLM for zero-padding BERT inference ("continuous batching")

#7777 vadimkantorov opened 2 weeks ago
0
Unpredictability in Sequence batching

#7776 arun-oai opened 2 weeks ago
0
feat: Adding RestrictedFeatures Support to the Python Frontend Bindings

#7775 KrishnanPrash opened 2 weeks ago
0
Dynamic batching from bls not working.

#7774 gerasim13 closed 2 weeks ago
1
Update 'main' to track development of 2.53.0 / 24.12

#7771 mc-nv closed 2 weeks ago
0
fix: Skip copyrights check for "expected" files in L0_model_config

#7770 yinggeh closed 2 weeks ago
0
fix: Adding copyright support for `.pyi` files

#7769 KrishnanPrash closed 2 weeks ago
0