triton-inference-server tensorrtllm_backend issues

triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend

Apache License 2.0

581 stars 81 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

The crash occurred when attempting to quantize the LLaMA model with W4A(fp)8_AWQ.

#415 pandengyao closed 2 months ago
4
the result use inflight_batcher_llm_client to send multiple lora weights is not same as use tensorrtllm

#413 stifles opened 2 months ago
3
Feature Request: Set maximum number of in flight

#412 TheCodeWrangler opened 2 months ago
0
Block reuse is currently not supported with beam width > 1

#411 tonylek opened 2 months ago
3
Update TensorRT-LLM backend

#410 kaiyux closed 2 months ago
0
Update end_to_end_test.py

#409 r0cketdyne opened 2 months ago
0
dynamic batching not working properly with tensorrtllm_backend

#422 gavinzb opened 2 months ago
3
Supporting beam search in streaming mode

#408 tonylek opened 2 months ago
0
Update TensorRT-LLM backend

#407 kaiyux closed 2 months ago
0
lora_task_id, lora_weights, lora_config not found in all_models/inflight_batcher_llm/tensorrt_llm_bls/1/lib/decode.py

#406 liao217 opened 2 months ago
0
Limited batched streaming when using inflight batching

#404 vnkc1 closed 2 months ago
3
Support bfloat16 LoRa Adaptors

#403 TheCodeWrangler opened 2 months ago
5
Feature request: support 'max_input_len' and 'max_num_tokens' in config.pbtxt of tensorrtllm

#402 Saigut closed 2 months ago
4
the tensorrtllm backends and onnxruntime backends

#401 tricky61 closed 2 months ago
2
Example of LoRa weights

#399 TheCodeWrangler opened 2 months ago
2
There is no option to set world_size in config file in model repository

#398 Saigut closed 2 months ago
4
Update TensorRT-LLM backend

#397 kaiyux closed 2 months ago
0
Can't launch tensorrtllm_backend triton server in 24.02-trtllm-python-py3

#396 Saigut closed 2 months ago
2
[Documentation improvement] Improve README for tensorrtllm_backend - v0.8.0

#395 kelkarn opened 2 months ago
0
Build via Docker is much big than the image in NGC

#394 ZJU-lishuang opened 2 months ago
5
Question to model not found

#393 geraldstanje opened 2 months ago
1
Crash with high request concurrency

#392 silverriver opened 2 months ago
3
Can tensorrtllm backend support LogitsProcessor?

#391 Muxv opened 2 months ago
1
TensorRT-LLM often hangs using both `tp_size 2` and `enable_context_fmha`.

#390 lkm2835 opened 2 months ago
2
Confusion about versions and NGC images

#389 mbahri opened 2 months ago
4
SAFETENSORS and OpenAI style endpoint

#388 RonanKMcGovern opened 3 months ago
3
Update TensorRT-LLM backend

#387 kaiyux closed 3 months ago
0
RemoteDisconnected('Remote end closed connection without response')

#386 trillionmonster opened 3 months ago
1
Can't build docker image with Ryzen 5950x

#385 mallorbc opened 3 months ago
2
Update TensorRT-LLM backend

#384 Shixiaowei02 closed 3 months ago
0
[Question] How to know the inference has been finished with generate_stream API?

#383 activezhao opened 3 months ago
5
Right vs. left sided padding in in TensorRT-LLM backend examples

#382 JohnGiorgi closed 2 months ago
2
Crashes for long context requests

#381 Pernekhan opened 3 months ago
15
Update TensorRT-LLM backend

#380 kaiyux closed 3 months ago
0
Deployment of TensorRT-LLM Model on Triton Server

#379 jasonngap1 closed 3 months ago
2
Verify if inflight batching is running

#378 bprus closed 4 weeks ago
3
Triton Server Hangs when loading multi-GPU LLAMA2 Engine

#377 mindhash opened 3 months ago
1
convert_checkpoint.py not working - safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

#374 saurabhbhagwat opened 3 months ago
3
Update TensorRT-LLM backend

#373 kaiyux closed 3 months ago
0
Invalid argument: model input cannot have empty reshape for non-batching model as scalar tensors are not supported for tensorrt_llm

#372 mse700 closed 3 months ago
0
tritonserver crash (SIGNAL 11) when Opentelemetry trace is enable for trtllm backend

#371 npuichigo closed 2 months ago
3
Getting gemmPlugin error for mixtral model on v0.8.0 while hosting on triton server

#370 sarthak-phatate opened 3 months ago
1
Infer failed: Unable to parse 'data': Shape does not match true shape of 'data' field in generate endpoint

#369 bprus opened 3 months ago
0
Using Bert/Roberta with "tensorrtllm" backend directly ? (no Python lib like tensorrt-llm package)

#368 pommedeterresautee opened 3 months ago
5
sreaming mode doesn't work

#367 dongteng opened 3 months ago
2
Memory avalable for KV using Triton TRT-LLM backed is lower than using TRT-LLM directly

#366 UnyieldingOrca opened 3 months ago
3
Typo in README decoupled mode: Make text consistent for boolean variable in README.

#365 esnvidia opened 3 months ago
1
[BUG] Missing `tokenizer_type` parameter to config.pbtxt

#364 esnvidia opened 3 months ago
1
CUDA runtime error in cudaDeviceGetDefaultMemPool

#363 tobernat opened 3 months ago
7
Update TensorRT-LLM backend

#362 kaiyux closed 3 months ago
0

Previous Next