issues
search
triton-inference-server
/
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.53k
stars
1.4k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add support for response sender in the default mode
#7311
kthui
opened
1 day ago
0
ci: Support BF16 data type in TensorRT backend
#7310
pskiran1
opened
1 day ago
0
build: Update vllm version to v0.4.3 (latest)
#7309
oandreeva-nv
opened
1 day ago
0
triton malloc fail
#7308
MouseSun846
opened
1 day ago
1
unexpected datatype TYPE_INT64 for inference input ,expecting TYPE_INT32
#7307
CallmeZhangChenchen
opened
1 day ago
0
docs: Add default template that diverts to sub templates
#7306
jbkyang-nvi
closed
2 days ago
0
Add TT-Metalium as a backend
#7305
jvasilje
opened
2 days ago
0
fix: Fix L0_input_validation--base
#7304
yinggeh
opened
2 days ago
3
Why is my model in ensemble receiving out-of-order input
#7303
Joenhle
opened
2 days ago
1
Tritonserver for FIL backend not starting
#7301
lee-tunnicliffe
opened
2 days ago
0
Any example of triton-vllm with c++ client?
#7300
tricky61
closed
2 days ago
0
Update openvino to 2024.0.0
#7299
krishung5
closed
2 days ago
0
Update 'main' post 24.05 release
#7298
tanmayv25
closed
3 days ago
1
Update 'main' post 24.05 release
#7297
tanmayv25
closed
3 days ago
0
ONNX backend with TensorRT optimizer sometimes fails to start
#7296
ShuaiShao93
opened
3 days ago
1
How does Triton implement one instance to handle multiple requests simultaneously?
#7295
SeibertronSS
opened
3 days ago
1
Incorrect asset tritonserver2.35.0-jetpack5.1.2-update-2.tgz
#7294
joachimhgg
opened
3 days ago
0
triton-inference-server cannot be started
#7293
tuninger
opened
3 days ago
1
Add test for improper response sending from model
#7292
kthui
opened
4 days ago
1
Update main to track development for 2.47.0 / r24.06
#7291
tanmayv25
closed
3 days ago
0
docs: Update PR templates
#7290
jbkyang-nvi
closed
2 days ago
1
Backend support for .keras files?
#7289
chriscarollo
opened
4 days ago
0
Revert file copy
#7288
mc-nv
closed
4 days ago
0
Support histogram custom metric in Python backend
#7287
ShuaiShao93
opened
4 days ago
2
Add testing for libtorch cudnn
#7286
Tabrizian
opened
4 days ago
0
What is the correct way to run inference in parallel in Triton?
#7283
sandesha-hegde
opened
4 days ago
0
A Confusion about prefetch
#7282
SunnyGhj
opened
4 days ago
2
Windows 10 docker build Error "Could not locate a complete Visual Studio instance"
#7281
jinkilee
opened
5 days ago
2
Specific structure for ensemble model may causes deadlock
#7280
ukus04
opened
5 days ago
0
Automatically unload (oldest) models when memory is full
#7279
elmuz
opened
5 days ago
2
YOLOv8n-poses is giving me a negative output error
#7278
olooeez
opened
5 days ago
2
No 24.05-trtllm-python-py3 in NGC Repo
#7277
avianion
closed
2 days ago
2
No trtllm tag in ngc for 24.05
#7276
TheCodeWrangler
closed
2 days ago
4
[Bug] Model 'ensemble' receives inputs originated from different decoupled models
#7275
michaelnny
opened
1 week ago
0
Minor fix for L0_backend_python
#7274
krishung5
closed
4 days ago
0
Update README.md 2.46.0 / 24.05
#7273
mc-nv
closed
1 week ago
0
Triton BLS model with dynamic batching does not execute expected batch size.
#7271
njaramish
opened
1 week ago
0
How to deploy Triton Inference Server Container (tritonserver:24.04-trtllm-python-py3) in K8S without launching Triton Server directly?
#7270
Ryan-ZL-Lin
closed
4 days ago
0
the method hang
#7269
fishfl
opened
1 week ago
0
Tritonserver hangs on launch with python backend
#7268
JamesBowerXanda
opened
1 week ago
1
docker image for triton 24.04 has incorrect CUDA version reported
#7267
stephanbertl
closed
4 days ago
2
Custom backend using recommended.cc not generating correct output
#7266
jgrsdave
opened
1 week ago
1
Fix gRPC streaming non-decoupled segfault if sending response and final flag separately
#7265
kthui
opened
1 week ago
1
Pods Receiving Traffic Too Early When Scaling with HPA Causes 'Socket Closed' Errors on Triton Inference Server
#7264
patriksabol
opened
1 week ago
6
Add to the serve-side metrics on the input and output sizes
#7263
yongbinfeng
opened
1 week ago
1
CUDA Failing to initialize in docker container
#7262
regexboi
opened
1 week ago
3
Added new flag for GPU peer access API control
#7261
indrajit96
opened
1 week ago
0
Exclude Jax example from Python 3.8
#7260
krishung5
closed
1 week ago
0
Return an error if --load-model is specified without explicit model control mode
#7259
rmccorm4
closed
1 week ago
0
Update expected error message
#7258
kthui
opened
1 week ago
1
Next