issues
search
npuichigo
/
openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
MIT License
177
stars
27
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Response parameter CompletionUsage return 0 token consumption
#56
moyerlee
opened
1 month ago
1
failed to run custom build command for `openai_trtllm v0.2.1 (/root/openai_trtllm)
#55
moyerlee
closed
1 month ago
2
Support for model in nemo format
#54
Minhhnh
opened
2 months ago
7
openai_trtllm return 200 directly to the client when TTFT is greater than 15 seconds
#53
mynameiskeen
opened
3 months ago
1
Support for Llama 3.1
#52
datdo-msft
opened
3 months ago
0
Fix dynamic length of stop words. (#50)
#51
Raphael-Jin
closed
4 months ago
0
Multiple stop_words does not work
#50
Raphael-Jin
closed
3 months ago
3
Add API Key configuration option for authorization
#49
sampritipanda
closed
4 months ago
0
all option is same as openai?
#48
dongs0104
opened
4 months ago
6
Feature request - Add all v1/ routes
#47
visitsb
opened
5 months ago
3
Missing spaces
#46
Mary-Sam
opened
5 months ago
2
Triton Parameter with OpenAI endpoint
#45
FernandoDorado
closed
5 months ago
5
llama 3 tokenizer no longer works - updated eos token
#44
avianion
opened
6 months ago
5
support for llama 3
#43
avianion
opened
6 months ago
4
Switch to use NGC official docker in docker-compose.yml
#42
npuichigo
closed
7 months ago
0
Refine README to include recent changes
#41
npuichigo
closed
7 months ago
0
Clarify README history-template and add llama3 chat template
#40
Vokturz
closed
7 months ago
1
Can't able to connect to triton
#39
tapansstardog
closed
7 months ago
9
ERROR: expected number of inputs between 1 and 3 but got 9 inputs for model
#38
samzong
opened
7 months ago
8
Output nothing but the gpu was working
#37
jaywongs
closed
7 months ago
9
How to add username/password into client/openai_completion.py when calling triton inference server?
#36
zengqingfu1442
closed
8 months ago
2
The llm model must be served by triton inference server' ensemble scheduler?
#35
zengqingfu1442
closed
7 months ago
4
Use docstring for help message
#34
npuichigo
closed
8 months ago
0
Update README.md
#33
npuichigo
closed
8 months ago
0
Refine help string and document
#32
npuichigo
closed
8 months ago
0
An error has occured while running client example "peer closed connection without sending complete message body (incomplete chunked read)"
#31
plt12138
closed
8 months ago
10
[codellama] There is no space between each words
#30
charllll
closed
6 months ago
8
Rich debug info
#29
npuichigo
closed
8 months ago
0
Fix baichuan template
#28
npuichigo
closed
8 months ago
0
Add history template for baichuan
#27
npuichigo
closed
8 months ago
0
feat: added functionality to customize prompt
#26
pdylanross
closed
8 months ago
3
Propagate opentelemetry context to triton inference server for better tracing
#25
npuichigo
closed
8 months ago
0
Remove unnecessary code in TensorRT LLM 0.8.0
#24
npuichigo
closed
8 months ago
0
Suport for different chat format
#23
rawk-v
closed
8 months ago
4
Update telemetry.rs
#22
npuichigo
closed
10 months ago
0
Update README.md
#21
npuichigo
closed
10 months ago
0
Update to use axum 0.7 and enhance telemetry
#20
npuichigo
closed
10 months ago
0
Update startup.rs
#19
npuichigo
closed
10 months ago
0
Is there a Python version?
#18
hijkzzz
closed
11 months ago
0
Add LICENSE
#17
npuichigo
closed
11 months ago
0
License
#16
csmileyk
closed
10 months ago
3
Fix triton server launching
#15
npuichigo
closed
11 months ago
0
Make it LangChain compatible
#14
npuichigo
closed
11 months ago
0
Add more openai compatible parameters
#13
npuichigo
closed
12 months ago
0
Add a demo gif in README
#12
npuichigo
closed
12 months ago
0
Fix lfs
#11
npuichigo
closed
12 months ago
0
Fix readme
#10
npuichigo
closed
12 months ago
0
Disable git lfs and make it simple
#9
npuichigo
closed
12 months ago
0
Disable lfs and make it simple
#8
npuichigo
closed
12 months ago
0
Add tutorial and example
#7
npuichigo
closed
12 months ago
0
Next