triton-inference-server server issues

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html

BSD 3-Clause "New" or "Revised" License

8.38k stars 1.49k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Memory Leak in NVIDIA Triton Server (v24.09-py3) with model-control-mode=explicit

#7727 Mustafiz48 opened 1 month ago
6
Unrecognized configuration class to build an AutoTokenizer for microsoft/Florence-2-base-ft

#7726 shihao28 closed 3 weeks ago
1
Build: Update Openvino and vLLM versionsfor Release 24.10

#7725 pvijayakrish closed 1 month ago
0
No content returned with OpenAI-Compatible Frontend Beta

#7724 Loc8888 closed 1 month ago
1
Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1c0)

#7723 wxk-cmd opened 1 month ago
1
Facing import error in python backend on Apple M2/M3 chips

#7722 TheMightyRaider opened 1 month ago
3
Revert "Change compute capablity min value (#7708)"

#7721 mc-nv closed 1 month ago
0
refactor: moving `tritonfrontend` to `@handle_triton_error` decorator

#7720 KrishnanPrash closed 1 month ago
0
ONNX CUDA session not working in python backend

#7719 jsoto-gladia opened 1 month ago
3
[Bug] Error when serving Torch-TensorRT JIT model to Nvidia-Triton

#7718 zmy1116 opened 1 month ago
0
Removing caching on windows.

#7717 mc-nv closed 3 weeks ago
2
Wget onnx model fails for end-to-end example

#7716 shenj68 closed 3 weeks ago
1
docs: Add support matrix for model parallelism in OpenAI Frontend

#7715 rmccorm4 closed 1 month ago
0
Does Nvidia Triton Inference Server Support AutoML framework?

#7714 IamExperimenting opened 1 month ago
0
nv_inference_request_failure metric does not increase

#7713 vpvpvpvp opened 1 month ago
0
[Do not merge!] Build: Remove TRT model generation for V100

#7712 pvijayakrish opened 1 month ago
0
docs: Clarify meanings of ensemble key and value

#7711 kthui closed 1 month ago
0
chore: Fix genai-perf command and add missing copyrights

#7710 rmccorm4 closed 1 month ago
0
fix: Fix L0_perf_nomodel shared memory

#7709 kthui closed 1 month ago
3
Change compute capablity min value

#7708 mc-nv closed 1 month ago
0
test: Add L0_additional_dependency_dirs

#7707 fpetrini15 closed 1 month ago
0
How to maximize single-model inference performance

#7706 lei1liu opened 1 month ago
0
fix:Split L0_nomodel_perf into 2 test to ensure better debug-ability and resource util for PA

#7705 indrajit96 opened 1 month ago
0
test: Update server repo for some tests

#7704 jbkyang-nvi closed 1 month ago
0
feat: Metrics Support in `tritonfrontend`

#7703 KrishnanPrash closed 3 weeks ago
0
Implementing Model Deployments at Scale Using Kubernetes with Triton Server and MLflow Pipelines

#7702 haridassaiprakash opened 1 month ago
0
test: Allow ensemble to create the final response even if some of the outputs are not created

#7701 kthui closed 1 month ago
0
fix: Fix bug when targeting the TRT-LLM backend ensemble

#7700 blongnv closed 1 month ago
2
fix: Re-enables copyright hook, updates GitHub Action to only run pre-commi…

#7699 pranavm-nvidia closed 1 month ago
0
Does Triton support multiple TensorFlow backends simultaneously?

#7698 ragavendrams opened 1 month ago
0
test: TC for Metric P0 nv_load_time per model

#7697 indrajit96 opened 1 month ago
0
docs: Add beta note to OpenAI compatible API

#7695 rmccorm4 closed 1 month ago
0
test: Test and document histogram latency metrics

#7694 yinggeh closed 1 month ago
0
Build: Update TRT release branch referenced in model gen file

#7693 pvijayakrish opened 1 month ago
1
Whats the query to calculate triton model latency per request? Is it nv_inference_request_duration_us / nv_inference_exec_count + nv_inference_queue_duration_us

#7692 jayakommuru opened 1 month ago
1
docs: Add example outputs to OpenAI Frontend docs

#7691 KrishnanPrash closed 1 month ago
1
Encountering stuck situations when using both Triton client and multiprocessing simultaneously

#7690 Soul-Code opened 1 month ago
0
Segmentation fault

#7689 lizhenneng closed 3 weeks ago
2
Possible bug in reference counting with shared memory regions

#7688 hcho3 opened 1 month ago
1
Update README and versions for 24.10

#7687 pvijayakrish closed 1 month ago
0
Build: Update README and versions for 24.10

#7686 pvijayakrish opened 1 month ago
1
The pth model to the triton pt model failed

#7685 linsistqb opened 1 month ago
0
test: Enhance Python gRPC streaming test to send multiple requests

#7684 kthui closed 1 month ago
1
refactor: Removing `Server` subclass from `tritonfrontend`

#7683 KrishnanPrash closed 1 month ago
0
fix: Support sampling parameters of type List for vLLM backend (stop words)

#7682 rmccorm4 closed 1 month ago
0
build: Adding `tritonfrontend` to `build.py`

#7681 KrishnanPrash closed 1 month ago
1
Ability to do casting between datatypes within backend

#7680 kronoker opened 1 month ago
0
are FP8 models supported in Triton ??

#7678 jayakommuru opened 1 month ago
7
Triton ONNX runtime backend slower than onnxruntime python client on CPU

#7677 Mitix-EPI opened 1 month ago
2
Dynamic batching not working with TRT-LLM backend

#7676 ShuaiShao93 closed 1 month ago
1

Previous Next