issues
search
huggingface
/
tgi-gaudi
Large Language Model Text Generation Inference on Habana Gaudi
http://hf.co/docs/text-generation-inference
Apache License 2.0
27
stars
47
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
tgi-gaudi server error with long inputs sent to chat_completion api using openai python sdk
#248
minmin-intel
opened
3 days ago
0
Remove nvidia packages
#247
yuanwu2017
opened
1 week ago
0
Remove the torch package in requirements.txt
#246
yuanwu2017
closed
2 weeks ago
5
With this change, bucketing/padding of input is applied to health check.
#245
srajabos
closed
1 week ago
6
Adding Universal Assisted Generation
#244
edlee123
opened
3 weeks ago
0
Update health.rs
#243
srajabos
closed
2 weeks ago
0
updated release version to 2.0.6
#242
tthakkal
closed
3 weeks ago
0
updated supported models list table in readme
#241
tthakkal
closed
3 weeks ago
0
Upgrade SynapseAI version to 1.18.0
#240
tthaddey
closed
3 weeks ago
2
Revert gemma flash attention as its not fully enabled in OH
#239
tthakkal
closed
1 month ago
1
Incorrect answer with openai compatible penalty parameters
#238
Spycsh
opened
1 month ago
1
requirements name - cabelo@opensuse.org
#237
cabelo
closed
1 month ago
0
Remove References to torch compile mode in readme
#236
tthakkal
closed
1 month ago
1
Enables Flash Attention in TGI for gemma models
#235
tthakkal
closed
1 month ago
0
set ignore EOS by using TEXT_GENERATION_SERVER_IGNORE_EOS_TOKEN
#234
schoi-habana
closed
1 month ago
1
Fix sysntax error in PR 232
#233
tthakkal
closed
1 month ago
0
Enabling Flash Attention support for falcon model
#232
tthakkal
closed
1 month ago
0
OH tag ci_11102024 keeping synapse 1.17
#231
schoi-habana
closed
1 month ago
0
Upgrade Synapse and DS to 1.18
#230
schoi-habana
closed
1 month ago
0
Remove all references to habana_quantization_toolkit for 1.18
#229
tthakkal
closed
1 month ago
1
Fix gpt_bigcode/starcoderbase-3b accuracy issue
#228
schoi-habana
closed
1 month ago
1
upgrade to SynapseAI 1.18
#227
yuanwu2017
closed
3 weeks ago
18
Enhancements to README
#226
MohitIntel
closed
1 month ago
1
Upgrade to 2.3.1
#225
yuanwu2017
opened
2 months ago
3
Removed functions iterating over tensors from torch compilation process
#224
jczaja
opened
2 months ago
0
Generation stopped too early without hitting stop condition
#223
minmin-intel
opened
2 months ago
7
Upgrade to Optimum Habana v1.13.2
#222
regisss
closed
2 months ago
0
Update README.md with changes related to LLava-next multi card support
#221
tthakkal
closed
2 months ago
2
Llava-next: Added flash_attention_recompute option
#220
tthakkal
closed
2 months ago
0
Only Apply the TP in language_model
#219
yuanwu2017
closed
2 months ago
17
llama3.1-70B-instruct 422 error Template error: unknown test: test iterable is unknown (in <string>:99)
#218
minmin-intel
opened
2 months ago
2
Enable the AutoGPTQ
#217
yuanwu2017
closed
2 months ago
2
When running llama2 7b, inference some 2k length prompt concurrently will cause TGI service crash.
#216
yao531441
closed
8 hours ago
6
Downgrade sympy to match synapaseAI 1.18 base image
#215
tthakkal
closed
2 months ago
2
Make prefill time of static benchmark correct
#214
schoi-habana
closed
3 months ago
0
readme changes
#213
tthakkal
closed
3 months ago
1
Updated docker image version to 2.0.4
#212
tthakkal
closed
3 months ago
1
Do not merge: Release testing 2.0.4
#211
tthakkal
closed
3 months ago
1
Add qwen2 fp8 quant support
#210
changwangss
closed
3 months ago
1
llava-next Fp8
#209
yuanwu2017
closed
3 months ago
4
Upgrade SynapseAI version to 1.17.0
#208
yuanwu2017
closed
3 months ago
14
GPTQ uint4 quantization broken
#207
endomorphosis
opened
3 months ago
2
Resolved CVEs
#206
ModiIntel
opened
3 months ago
3
Make bf16 default for hpu
#205
abhilash1910
closed
3 months ago
0
Schoi/llama3.1 tokenizer
#204
endomorphosis
closed
3 months ago
3
Enable quantization with INC
#203
tthakkal
closed
3 months ago
3
Enabled fused_sdpa flash attention for starcoder2 model
#202
tthakkal
closed
3 months ago
0
Undo disable of hpu graphs for starcoder
#201
vidyasiv
closed
3 months ago
0
Updated Readme to use flash attention for llama
#200
tthakkal
closed
3 months ago
1
Pad token handling for Llama3.1
#199
schoi-habana
closed
3 months ago
6
Next