triton-inference-server server issues

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html

BSD 3-Clause "New" or "Revised" License

8.39k stars 1.49k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Build: Updating to allow passing DOCKER_GPU_ARGS at model generation

#7566 pvijayakrish closed 3 months ago
0
Release: Update NGC versions post-24.08 release

#7565 pvijayakrish closed 3 months ago
0
Release: Update README for r24.08

#7564 pvijayakrish closed 3 months ago
0
docs: Add tensorrtllm_backend into doc generation

#7563 krishung5 closed 3 months ago
0
[ERROR] No available memory for the cache blocks.

#7562 TheNha opened 3 months ago
0
feat: OpenAI Compatible Frontend

#7561 rmccorm4 closed 1 month ago
7
test: Load new model version should not reload loaded existing model …

#7560 mc-nv closed 3 months ago
0
ci: Raise Documentation Generation Errors

#7559 fpetrini15 closed 3 months ago
1
How is the order determined for loading a model onto a specific device?

#7558 mhbassel closed 2 months ago
5
floating point exception with Triton version 24.07 when loading tensorrt_llm backend models

#7556 janpetrov closed 2 months ago
1
feat: Add GRPC error codes to GRPC streaming if enabled by user. (#7499)

#7555 mc-nv closed 3 months ago
0
Intermittent `L0_decoupled_grpc_error` crash fixed. (#7552)

#7554 mc-nv closed 3 months ago
0
test: Load new model version should not reload loaded existing model …

#7553 kthui closed 3 months ago
1
Intermittent `L0_decoupled_grpc_error` crash fixed.

#7552 indrajit96 closed 3 months ago
0
Build Triton and Backends On Windows

#7551 mhbassel closed 3 months ago
4
Can't load custom backend shared library from s3 (24.07)

#7550 gerasim13 opened 3 months ago
2
tritonserver preload trt plugin got warning message and many core files : Failed to compile generated PTX with ptxas. Falling back to compilation by driver.

#7549 LinGeLin opened 3 months ago
0
low performance at large concurrent requests

#7548 seyunchoi opened 3 months ago
5
Encounter `Stub process is not healthy` only with kserve pod

#7547 thechaos16 closed 3 months ago
1
feat: Add vLLM counter metrics access through Triton (#7493)

#7546 mc-nv closed 3 months ago
0
test: Add python backend tests for the new histogram metric (#7540)

#7545 mc-nv closed 3 months ago
0
Build: Update Vllm version for 24.08

#7544 pvijayakrish closed 3 months ago
0
[feature request] C# / .NET bindings for in-proc C-API and in-proc wrapper's C++-API

#7543 vadimkantorov opened 3 months ago
3
Indrajit r24.08 cp

#7542 indrajit96 closed 3 months ago
1
Inconsistent prediction results using onnx backend with tensorrt enabled

#7541 fangpings opened 3 months ago
0
test: Add python backend tests for the new histogram metric

#7540 yinggeh closed 3 months ago
2
Build: Upgrading vLLM version for 24.08 release

#7539 pvijayakrish closed 3 months ago
0
Build: Upgrade vLLM version for 24.08 release

#7538 pvijayakrish closed 3 months ago
0
docs: Load new model version should not reload loaded existing model version(s)

#7537 kthui closed 3 months ago
1
build: RHEL8 EA2 Backends

#7535 fpetrini15 closed 3 months ago
2
Discrepancy in Inference Timing between trtexec and Triton Server(TensorRT backend) with gRPC Communication for YOLOV8

#7533 twotwoiscute closed 3 months ago
1
Support request cancellation on timeout for sync grpc client

#7532 ShuaiShao93 opened 3 months ago
0
Failed to stat file model.onxx while using conda-pack in configs

#7531 Spectra456 opened 3 months ago
1
Support passing variables in config.pbtxt

#7530 riZZZhik opened 3 months ago
0
docs: Triton TRT-LLM user guide

#7529 krishung5 closed 3 months ago
0
vllm backend - UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'numpy'

#7528 dhanushSB96 closed 3 months ago
1
test: Load new model version should not reload loaded existing model version(s)

#7527 kthui closed 3 months ago
0
How to send the byte or string data in array in perf analyzer

#7526 Kanupriyagoyal opened 3 months ago
3
test: Test histogram metric

#7525 yinggeh closed 3 months ago
0
build: RHEL8 PyTorch Backend

#7524 fpetrini15 closed 3 months ago
0
ValidateBytesInputs() check failed in Big Endian Machines

#7523 Hemaprasannakc opened 3 months ago
2
CI/Build: Pre-Release Changes for 24.08

#7522 pvijayakrish closed 3 months ago
1
24.08 Changes

#7521 pvijayakrish closed 3 months ago
0
How to use StopStream when use AsyncStreamInfer?

#7520 tricky61 opened 3 months ago
0
build: RHEL 8 Compatibility

#7519 nv-kmcgill53 closed 3 months ago
0
triton need api docs like vllm fastapi docs

#7518 kinglion811 opened 3 months ago
1
Stateful decoupled bls model: malloc_consolidate(): unaligned fastbin chunk detected

#7517 007durgesh219 opened 3 months ago
0
High GPU memory use

#7516 cile98 opened 3 months ago
0
SSLEOFError when result from async_infer is not available in http client

#7515 briedel opened 3 months ago
0
Docker build of Triton Server r24.07 on Ubuntu 22.04/Arm fails

#7513 goetzrieger opened 3 months ago
6

Previous Next