triton-inference-server server issues

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html

BSD 3-Clause "New" or "Revised" License

8.38k stars 1.49k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[doc]Adjusted formatting of the warning

#7675 oandreeva-nv closed 1 month ago
0
ci: Reducing flakiness of `L0_python_api`

#7674 KrishnanPrash closed 1 month ago
0
[docs] Removed vLLM meetup announcement

#7673 oandreeva-nv closed 1 month ago
0
Histogram Metric for multi-instance tail latency aggregation

#7672 AshwinAmbal opened 1 month ago
1
fix: `tritonfrontend` gRPC Streaming Segmentation Fault

#7671 KrishnanPrash closed 1 month ago
2
DCGM unable to start: DCGM initialization error，Error: Failed to initialize NVML

#7670 coder-2014 opened 1 month ago
2
Error: ensemble of tensorrt + python_be + tensorrt is supported on jetson?

#7667 olivetom opened 1 month ago
12
feat: Add copyright hook

#7666 pranavm-nvidia closed 1 month ago
6
Triton ensemble LLM model (Llama 3.1 8B Instruct) returns prompt in the output

#7665 alvaroalfaro612 closed 1 month ago
2
When there are multiple GPU, only one GPU is used

#7664 gyr66 opened 2 months ago
4
chore: Fix argparse typo, cleanup argparse groups, make kserve frontends optional

#7663 rmccorm4 closed 1 month ago
1
feat: KServe Bindings to start tritonfrontend

#7662 KrishnanPrash closed 2 months ago
1
ERROR: Failed to create instance: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found

#7661 winstxnhdw closed 2 months ago
1
Direct Streaming of Model Weights from Cloud Storage to GPU Memory

#7660 azsh1725 opened 2 months ago
4
Release: Update README for r24.09

#7659 pvijayakrish closed 2 months ago
0
Build: Update server in master post 24.09

#7658 pvijayakrish closed 1 month ago
0
Instance_group config behaves strangely in the example jetson/concurrency_and_dynamic_batching

#7657 alphadadajuju closed 1 month ago
1
BUG - Error: in triplet x64-windows: Unable to find a valid Visual Studio instance

#7656 mhbassel closed 2 months ago
1
wired inference time for tritonserver

#7655 qiuzhewei opened 2 months ago
1
Deploy TTS model with Triton and onnx backend, failed:Protobuf parsing failed

#7654 AnasAlmana opened 2 months ago
5
perf: Improve chat completions performance at high concurrency

#7653 rmccorm4 closed 2 months ago
1
Best Practices for Integrating Custom CUDA Code in Triton Server Inference Pipeline

#7651 saurabh203 closed 2 months ago
0
Big performance drop when using ensemble model over separate calls

#7650 jcuquemelle opened 2 months ago
2
[Critical] Triton stops processing requests and crashes

#7649 appearancefnp opened 2 months ago
6
python_backend pytorch example as_numpy() error

#7647 flian2 closed 2 weeks ago
7
AsyncStreamTransfer memory leak

#7646 tricky61 closed 2 months ago
0
Qwen2-14B inference garbled code

#7644 kazyun closed 2 months ago
0
Make State Tensor Stay in Device Memory

#7643 poor1017 opened 2 months ago
1
Temporarily disable corner cases for response sender test

#7642 Tabrizian opened 2 months ago
0
How many instances can Triton support for parallel inference at most?

#7641 wwdok opened 2 months ago
0
docs: Fix broken links

#7640 emmanuel-ferdman closed 1 month ago
1
incompatible constructor arguments for c_python_backend_utils.InferenceRequest

#7639 adrtsang opened 2 months ago
2
triton gpu deploy suddenly become very slow from 0.03s to 12s, how to solve it ?

#7638 yiluzhuimeng opened 2 months ago
1
Can TIS run both vllm and torch backend together?

#7637 k0286 closed 2 months ago
2
UNAVAILABLE: Not found: unable to load shared library: %1 is not a valid Win32 application

#7636 mhbassel closed 1 month ago
5
ci: Set stability factor to a higher value

#7634 lkomali closed 2 months ago
0
Dockerfile.win10.min - Update dependency versions

#7633 mc-nv closed 2 months ago
0
triton 24.08: "Poll failed for model directory 'ensemble': unexpected platform type 'ensemble' for ensemble"

#7632 xiejibing closed 1 month ago
3
Triton gives wrong output

#7631 Tpoc311 closed 2 months ago
1
./fetch_models.sh - unable to resolve host address

#7630 surprisedPikachu007 closed 3 weeks ago
1
[feature request] ffmpeg backend for simplifying decoding of audio/video inputs

#7629 vadimkantorov opened 2 months ago
1
Is it possible to disable fallback on CPU?

#7628 Pavloveuge closed 2 months ago
2
Does triton inference server support customers custom feature but do not need to modify the origin code, like some plugin feature?

#7627 GGBond8488 opened 2 months ago
2
Failed to unload model (vLLM Backend) after running inference in streaming mode

#7626 TabTabWooo closed 1 month ago
2
fix: usage of ReadDataFromJson in array tensors

#7624 v-shobhit closed 1 month ago
1
First invocation of model - Dynamic batching doesn't work - Python Backend

#7623 ChristosCh00 opened 2 months ago
0
High Queue Latency With BLS

#7622 SandraWang-SH opened 2 months ago
4
Update fetch_models.sh

#7621 vd-nv closed 2 months ago
0
Ov2024.3

#7620 nnshah1 closed 2 months ago
0
Cherry-pick: Fix: Add mutex lock for state completion check in gRPC streaming to prevent race condition

#7619 pskiran1 closed 2 months ago
0

Previous Next