issues
search
triton-inference-server
/
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.38k
stars
1.49k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Memory Leak in NVIDIA Triton Server (v24.09-py3) with model-control-mode=explicit
#7727
Mustafiz48
opened
1 month ago
6
Unrecognized configuration class to build an AutoTokenizer for microsoft/Florence-2-base-ft
#7726
shihao28
closed
3 weeks ago
1
Build: Update Openvino and vLLM versionsfor Release 24.10
#7725
pvijayakrish
closed
1 month ago
0
No content returned with OpenAI-Compatible Frontend Beta
#7724
Loc8888
closed
1 month ago
1
Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1c0)
#7723
wxk-cmd
opened
1 month ago
1
Facing import error in python backend on Apple M2/M3 chips
#7722
TheMightyRaider
opened
1 month ago
3
Revert "Change compute capablity min value (#7708)"
#7721
mc-nv
closed
1 month ago
0
refactor: moving `tritonfrontend` to `@handle_triton_error` decorator
#7720
KrishnanPrash
closed
1 month ago
0
ONNX CUDA session not working in python backend
#7719
jsoto-gladia
opened
1 month ago
3
[Bug] Error when serving Torch-TensorRT JIT model to Nvidia-Triton
#7718
zmy1116
opened
1 month ago
0
Removing caching on windows.
#7717
mc-nv
closed
3 weeks ago
2
Wget onnx model fails for end-to-end example
#7716
shenj68
closed
3 weeks ago
1
docs: Add support matrix for model parallelism in OpenAI Frontend
#7715
rmccorm4
closed
1 month ago
0
Does Nvidia Triton Inference Server Support AutoML framework?
#7714
IamExperimenting
opened
1 month ago
0
nv_inference_request_failure metric does not increase
#7713
vpvpvpvp
opened
1 month ago
0
[Do not merge!] Build: Remove TRT model generation for V100
#7712
pvijayakrish
opened
1 month ago
0
docs: Clarify meanings of ensemble key and value
#7711
kthui
closed
1 month ago
0
chore: Fix genai-perf command and add missing copyrights
#7710
rmccorm4
closed
1 month ago
0
fix: Fix L0_perf_nomodel shared memory
#7709
kthui
closed
1 month ago
3
Change compute capablity min value
#7708
mc-nv
closed
1 month ago
0
test: Add L0_additional_dependency_dirs
#7707
fpetrini15
closed
1 month ago
0
How to maximize single-model inference performance
#7706
lei1liu
opened
1 month ago
0
fix:Split L0_nomodel_perf into 2 test to ensure better debug-ability and resource util for PA
#7705
indrajit96
opened
1 month ago
0
test: Update server repo for some tests
#7704
jbkyang-nvi
closed
1 month ago
0
feat: Metrics Support in `tritonfrontend`
#7703
KrishnanPrash
closed
3 weeks ago
0
Implementing Model Deployments at Scale Using Kubernetes with Triton Server and MLflow Pipelines
#7702
haridassaiprakash
opened
1 month ago
0
test: Allow ensemble to create the final response even if some of the outputs are not created
#7701
kthui
closed
1 month ago
0
fix: Fix bug when targeting the TRT-LLM backend ensemble
#7700
blongnv
closed
1 month ago
2
fix: Re-enables copyright hook, updates GitHub Action to only run pre-commi…
#7699
pranavm-nvidia
closed
1 month ago
0
Does Triton support multiple TensorFlow backends simultaneously?
#7698
ragavendrams
opened
1 month ago
0
test: TC for Metric P0 nv_load_time per model
#7697
indrajit96
opened
1 month ago
0
docs: Add beta note to OpenAI compatible API
#7695
rmccorm4
closed
1 month ago
0
test: Test and document histogram latency metrics
#7694
yinggeh
closed
1 month ago
0
Build: Update TRT release branch referenced in model gen file
#7693
pvijayakrish
opened
1 month ago
1
Whats the query to calculate triton model latency per request? Is it nv_inference_request_duration_us / nv_inference_exec_count + nv_inference_queue_duration_us
#7692
jayakommuru
opened
1 month ago
1
docs: Add example outputs to OpenAI Frontend docs
#7691
KrishnanPrash
closed
1 month ago
1
Encountering stuck situations when using both Triton client and multiprocessing simultaneously
#7690
Soul-Code
opened
1 month ago
0
Segmentation fault
#7689
lizhenneng
closed
3 weeks ago
2
Possible bug in reference counting with shared memory regions
#7688
hcho3
opened
1 month ago
1
Update README and versions for 24.10
#7687
pvijayakrish
closed
1 month ago
0
Build: Update README and versions for 24.10
#7686
pvijayakrish
opened
1 month ago
1
The pth model to the triton pt model failed
#7685
linsistqb
opened
1 month ago
0
test: Enhance Python gRPC streaming test to send multiple requests
#7684
kthui
closed
1 month ago
1
refactor: Removing `Server` subclass from `tritonfrontend`
#7683
KrishnanPrash
closed
1 month ago
0
fix: Support sampling parameters of type List for vLLM backend (stop words)
#7682
rmccorm4
closed
1 month ago
0
build: Adding `tritonfrontend` to `build.py`
#7681
KrishnanPrash
closed
1 month ago
1
Ability to do casting between datatypes within backend
#7680
kronoker
opened
1 month ago
0
are FP8 models supported in Triton ??
#7678
jayakommuru
opened
1 month ago
7
Triton ONNX runtime backend slower than onnxruntime python client on CPU
#7677
Mitix-EPI
opened
1 month ago
2
Dynamic batching not working with TRT-LLM backend
#7676
ShuaiShao93
closed
1 month ago
1
Previous
Next