issues
search
triton-inference-server
/
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.38k
stars
1.49k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[doc]Adjusted formatting of the warning
#7675
oandreeva-nv
closed
1 month ago
0
ci: Reducing flakiness of `L0_python_api`
#7674
KrishnanPrash
closed
1 month ago
0
[docs] Removed vLLM meetup announcement
#7673
oandreeva-nv
closed
1 month ago
0
Histogram Metric for multi-instance tail latency aggregation
#7672
AshwinAmbal
opened
1 month ago
1
fix: `tritonfrontend` gRPC Streaming Segmentation Fault
#7671
KrishnanPrash
closed
1 month ago
2
DCGM unable to start: DCGM initialization error,Error: Failed to initialize NVML
#7670
coder-2014
opened
1 month ago
2
Error: ensemble of tensorrt + python_be + tensorrt is supported on jetson?
#7667
olivetom
opened
1 month ago
12
feat: Add copyright hook
#7666
pranavm-nvidia
closed
1 month ago
6
Triton ensemble LLM model (Llama 3.1 8B Instruct) returns prompt in the output
#7665
alvaroalfaro612
closed
1 month ago
2
When there are multiple GPU, only one GPU is used
#7664
gyr66
opened
2 months ago
4
chore: Fix argparse typo, cleanup argparse groups, make kserve frontends optional
#7663
rmccorm4
closed
1 month ago
1
feat: KServe Bindings to start tritonfrontend
#7662
KrishnanPrash
closed
2 months ago
1
ERROR: Failed to create instance: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found
#7661
winstxnhdw
closed
2 months ago
1
Direct Streaming of Model Weights from Cloud Storage to GPU Memory
#7660
azsh1725
opened
2 months ago
4
Release: Update README for r24.09
#7659
pvijayakrish
closed
2 months ago
0
Build: Update server in master post 24.09
#7658
pvijayakrish
closed
1 month ago
0
Instance_group config behaves strangely in the example jetson/concurrency_and_dynamic_batching
#7657
alphadadajuju
closed
1 month ago
1
BUG - Error: in triplet x64-windows: Unable to find a valid Visual Studio instance
#7656
mhbassel
closed
2 months ago
1
wired inference time for tritonserver
#7655
qiuzhewei
opened
2 months ago
1
Deploy TTS model with Triton and onnx backend, failed:Protobuf parsing failed
#7654
AnasAlmana
opened
2 months ago
5
perf: Improve chat completions performance at high concurrency
#7653
rmccorm4
closed
2 months ago
1
Best Practices for Integrating Custom CUDA Code in Triton Server Inference Pipeline
#7651
saurabh203
closed
2 months ago
0
Big performance drop when using ensemble model over separate calls
#7650
jcuquemelle
opened
2 months ago
2
[Critical] Triton stops processing requests and crashes
#7649
appearancefnp
opened
2 months ago
6
python_backend pytorch example as_numpy() error
#7647
flian2
closed
2 weeks ago
7
AsyncStreamTransfer memory leak
#7646
tricky61
closed
2 months ago
0
Qwen2-14B inference garbled code
#7644
kazyun
closed
2 months ago
0
Make State Tensor Stay in Device Memory
#7643
poor1017
opened
2 months ago
1
Temporarily disable corner cases for response sender test
#7642
Tabrizian
opened
2 months ago
0
How many instances can Triton support for parallel inference at most?
#7641
wwdok
opened
2 months ago
0
docs: Fix broken links
#7640
emmanuel-ferdman
closed
1 month ago
1
incompatible constructor arguments for c_python_backend_utils.InferenceRequest
#7639
adrtsang
opened
2 months ago
2
triton gpu deploy suddenly become very slow from 0.03s to 12s, how to solve it ?
#7638
yiluzhuimeng
opened
2 months ago
1
Can TIS run both vllm and torch backend together?
#7637
k0286
closed
2 months ago
2
UNAVAILABLE: Not found: unable to load shared library: %1 is not a valid Win32 application
#7636
mhbassel
closed
1 month ago
5
ci: Set stability factor to a higher value
#7634
lkomali
closed
2 months ago
0
Dockerfile.win10.min - Update dependency versions
#7633
mc-nv
closed
2 months ago
0
triton 24.08: "Poll failed for model directory 'ensemble': unexpected platform type 'ensemble' for ensemble"
#7632
xiejibing
closed
1 month ago
3
Triton gives wrong output
#7631
Tpoc311
closed
2 months ago
1
./fetch_models.sh - unable to resolve host address
#7630
surprisedPikachu007
closed
3 weeks ago
1
[feature request] ffmpeg backend for simplifying decoding of audio/video inputs
#7629
vadimkantorov
opened
2 months ago
1
Is it possible to disable fallback on CPU?
#7628
Pavloveuge
closed
2 months ago
2
Does triton inference server support customers custom feature but do not need to modify the origin code, like some plugin feature?
#7627
GGBond8488
opened
2 months ago
2
Failed to unload model (vLLM Backend) after running inference in streaming mode
#7626
TabTabWooo
closed
1 month ago
2
fix: usage of ReadDataFromJson in array tensors
#7624
v-shobhit
closed
1 month ago
1
First invocation of model - Dynamic batching doesn't work - Python Backend
#7623
ChristosCh00
opened
2 months ago
0
High Queue Latency With BLS
#7622
SandraWang-SH
opened
2 months ago
4
Update fetch_models.sh
#7621
vd-nv
closed
2 months ago
0
Ov2024.3
#7620
nnshah1
closed
2 months ago
0
Cherry-pick: Fix: Add mutex lock for state completion check in gRPC streaming to prevent race condition
#7619
pskiran1
closed
2 months ago
0
Previous
Next