issues
search
replicate
/
cog-triton
A cog implementation of Nvidia's Triton server
Apache License 2.0
12
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
tensorrt-llm 0.12.0.dev2024073000, triton 2.46.0
#52
yorickvP
opened
3 months ago
0
Build failure with 0.12pre: cannot find libnvinfer_builder_resource_win.so.10.2.0
#51
joehoover
closed
4 months ago
1
tensorrt-llm: 0.9 -> 0.10, triton: 2.42.0 -> 2.44.0
#50
yorickvP
opened
4 months ago
1
tweak ci
#49
technillogue
closed
5 months ago
0
Stop sequences fail for some sequences
#48
joehoover
opened
5 months ago
1
don't raise an error if min/max tokens are both set to the same value
#47
technillogue
closed
5 months ago
0
better prompt errors
#46
technillogue
closed
5 months ago
0
a basic check at startup
#45
technillogue
closed
5 months ago
1
reuse downloaded weights
#44
technillogue
closed
5 months ago
0
better errors
#43
technillogue
closed
5 months ago
0
handle triton 0.10.0 not returning the entire sequence
#42
technillogue
closed
4 months ago
0
emit token count metrics and upgrade cog
#41
technillogue
closed
6 months ago
0
add `messages` input for chat formatting
#40
technillogue
opened
6 months ago
0
Update nix CI so that runner-86 is pushed to it's
#39
joehoover
closed
6 months ago
0
merge cog-trt-llm into this repo
#38
yorickvP
closed
6 months ago
1
fix max tokens (and optimize imports)
#37
technillogue
closed
6 months ago
0
Yorickvp/tokenizers 0 19
#36
technillogue
closed
6 months ago
0
Backport some changes from trtllm-0.9 branch
#35
yorickvP
closed
6 months ago
0
ensure that pad and end id are loaded as ints
#34
joehoover
closed
7 months ago
0
Update tensorrt-llm to v0.9.0
#33
yorickvP
closed
6 months ago
0
Build in github actions
#32
yorickvP
closed
7 months ago
0
Merge nvidia-*-cu12 python with nix's cudaPackages
#31
yorickvP
closed
7 months ago
0
ensure that max/min new tokens doesn't exceed max seq len
#30
joehoover
closed
7 months ago
0
Smaller Images
#29
yorickvP
closed
7 months ago
0
Joe/yorickvp/ci/joe/lang 218 investigate tps performance degradation
#28
joehoover
closed
7 months ago
1
Joe/yorickvp/ci/joe/lang 220 identify cog triton tps bottleneck
#27
joehoover
closed
7 months ago
0
superficial change to allow merge
#26
joehoover
closed
7 months ago
0
fix dockerfile for pip installing trt-llm -- fix for mpi path
#25
joehoover
closed
7 months ago
0
Joe/lang 214 add mock cog triton concurrency test to cog triton directory
#24
joehoover
closed
7 months ago
0
Unify cog-triton Dockerfile LANG-213
#23
joehoover
closed
7 months ago
0
restart triton during setup if it crashes or doesn't start within 3 minutes
#22
technillogue
closed
8 months ago
0
catch event keyerror
#21
technillogue
closed
8 months ago
0
Joe/lang 207 make cod triton predict signature compatible with current
#20
joehoover
closed
8 months ago
0
copy triton_templates in Dockerfile so we can use it to build configs…
#19
joehoover
closed
8 months ago
0
Joe/lang 205 make triton configuration configurable during predict setup
#18
joehoover
closed
8 months ago
0
Joe/lang 205 make triton configuration configurable during predict setup
#17
joehoover
closed
8 months ago
0
Return logits when return_logits is set to true.
#16
manishravula
closed
4 months ago
1
update cog to use a log method and increase timeout
#15
technillogue
closed
8 months ago
0
Joe/improve benchmark script
#14
joehoover
closed
8 months ago
0
Joe/build triton main
#13
joehoover
closed
8 months ago
0
attempt to restart triton
#12
technillogue
closed
4 months ago
1
remove unused eos class handler attribute
#11
joehoover
closed
9 months ago
0
operate on token strings instead of strings
#10
joehoover
closed
9 months ago
0
Joe/lang 200 patch trt llm triton backend stop sequences
#9
joehoover
closed
9 months ago
0
Joe/lang 197 llama generation does not stop when it should eos problem
#8
joehoover
closed
9 months ago
1
Joe/lang 194 implement model specific prompt formatting for cog triton
#7
joehoover
closed
9 months ago
1
don't delete downloaded weights if they are present, so that people can use volumes
#6
technillogue
closed
9 months ago
1
Joe/lang 193 fix mistral decoding
#5
joehoover
closed
9 months ago
1
Refactor predict method to include additional arguments. Add health c…
#4
joehoover
closed
9 months ago
1
concurrency: 64
#3
technillogue
closed
9 months ago
0
Next