issues
search
triton-inference-server
/
fastertransformer_backend
BSD 3-Clause "New" or "Revised" License
411
stars
134
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
SNOW-1752601 libarchive cve
#176
sfc-gh-dbove
closed
1 month ago
0
Snow 1455266 - Upgrade Triton to Resolve CVEs
#175
sfc-gh-dbove
opened
1 month ago
0
Free buffer forward pass Llama
#174
sfc-gh-ybsat
closed
1 year ago
0
tritonserver version
#173
double-vin
opened
1 year ago
0
Add langid python dependency
#172
sfc-gh-ybsat
closed
1 year ago
0
Whether fastertransformer supports gpt-2 classification model, such as GPT2ForSequenceClassification?
#171
cabbagetalk
opened
1 year ago
0
Batcher that doesn't merge batches
#170
sfc-gh-bprosnitz
closed
1 year ago
0
No response is received during inference in decoupled mode.
#169
amazingkmy
opened
1 year ago
0
what is the use of preprocessing & postprossing ? can i start fastertransformer only for bloom model ?
#168
flyingjohn
opened
1 year ago
1
the docs are not updated with the source code.
#167
trinhtuanvubk
opened
1 year ago
0
Failed to run on H100 GPU with tensor para=8
#166
sfc-gh-zhwang
opened
1 year ago
5
How to deploy multiple model in a node with multople GPUs
#165
jjjjohnson
opened
1 year ago
0
Memory usage is doubled when loading a fp16 model into bf16
#164
skyser2003
opened
1 year ago
2
Throughput (requests per second / RPS) not increasing when scaling up from 1 GPU to 4 GPUs
#163
chunyat
opened
1 year ago
0
Can i stop execution? (w/ `decoupled mode`)
#162
Yeom
opened
1 year ago
1
Do I need to specify ARG SM=80 when building the image manually?
#161
sfc-gh-zhwang
opened
1 year ago
0
is_return_log_probs is required for decoupled model?
#160
flexwang
opened
1 year ago
0
Updated README.md to refer to 23.05 instead of 23.04
#159
mshuffett
opened
1 year ago
0
fix fastertransformer build for 23.05+
#158
jbkyang-nvi
closed
1 year ago
2
[FT][ERROR] CUDA runtime error: out of memory /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/allocator.h:220
#157
bigmover
closed
1 year ago
1
start a triton server get error: Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
#156
bigmover
closed
1 year ago
3
Can I enable streaming on an ensemble model?
#155
flexwang
opened
1 year ago
3
Fix docker build to work with Triton 23.05 image
#153
samiur
closed
1 year ago
2
huggingface_bert_convert.py can't convert some key
#152
SeungjaeLim
opened
1 year ago
0
int8 support for gptj&gptneox
#151
rahuan
opened
1 year ago
0
Failing to build with triton 23.04
#150
bronzafa
opened
1 year ago
2
Dynamic batching does not work in decoupled model
#149
safehumeng
closed
1 year ago
1
Is deberta supported in the fastertranformer backend?
#148
sfc-gh-zhwang
opened
1 year ago
0
Why the model config for bert is using instance group as CPU instead of GPU?
#147
sfc-gh-zhwang
closed
1 year ago
0
enable llama model in FT backend
#146
shihy52x
opened
1 year ago
1
Add note on build instructions
#145
jbkyang-nvi
closed
1 year ago
0
Poll failed for model directory 'ensemble': output 'OUTPUT_0' for ensemble 'ensemble' is not written
#144
songkq
opened
1 year ago
0
Why is it needed to set max_batch_size to 1 under interactive mode?
#143
zhypku
opened
1 year ago
0
Why processing requests of batch size=1 is much slower than batch size>1
#142
mapcan
opened
1 year ago
0
FasterTransformer Backend fails to build using latest version of Triton Server
#140
mshuffett
opened
1 year ago
2
How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend?
#139
songkq
opened
1 year ago
1
triton support using factertransfer backend for flan-ul2 and flan-ul2-alpaca-lora
#138
ma-siddiqui
opened
1 year ago
0
config file for flan-ul2-alpaca-lora - config.pbtxt
#137
ma-siddiqui
opened
1 year ago
0
flan-ul2 sample config.pbtxt
#136
ma-siddiqui
opened
1 year ago
0
NCCL 'unhandled cuda error'
#135
SeibertronSS
closed
1 year ago
0
run end_to_end_test_llama.py error
#134
SherronBurtint
opened
1 year ago
0
Feature request: Conversion from GPTBigCodeForCausalLM / Starcoder
#132
michaelfeil
opened
1 year ago
1
How can I get stuck during generation?
#131
amazingkmy
opened
1 year ago
0
When hot-loading a large model, a segmentation fault will occur.
#130
ppppppppig
opened
1 year ago
1
fix docker multistage build error
#128
levinxo
closed
1 year ago
4
CUDA: Operation Not Supported
#127
nicobasile
opened
1 year ago
1
An error occurred while compiling the debug version
#126
hongqing1986
opened
1 year ago
0
GPT - J model produces garbage results
#125
BDODigitalTeam
opened
1 year ago
0
Update GPT NeoX guide with rendered table
#124
ankit-db
closed
1 year ago
0
Convert nemo-megatron-mt5-3B to binary files of fastertransformer successfully, but tritonserver fails when loading models with unmatched bias.bin.
#123
songkq
closed
1 year ago
2
Next