huggingface tgi-gaudi issues

huggingface / tgi-gaudi

Large Language Model Text Generation Inference on Habana Gaudi

http://hf.co/docs/text-generation-inference

Apache License 2.0

27 stars 47 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

tgi-gaudi server error with long inputs sent to chat_completion api using openai python sdk

#248 minmin-intel opened 3 days ago
0
Remove nvidia packages

#247 yuanwu2017 opened 1 week ago
0
Remove the torch package in requirements.txt

#246 yuanwu2017 closed 2 weeks ago
5
With this change, bucketing/padding of input is applied to health check.

#245 srajabos closed 1 week ago
6
Adding Universal Assisted Generation

#244 edlee123 opened 3 weeks ago
0
Update health.rs

#243 srajabos closed 2 weeks ago
0
updated release version to 2.0.6

#242 tthakkal closed 3 weeks ago
0
updated supported models list table in readme

#241 tthakkal closed 3 weeks ago
0
Upgrade SynapseAI version to 1.18.0

#240 tthaddey closed 3 weeks ago
2
Revert gemma flash attention as its not fully enabled in OH

#239 tthakkal closed 1 month ago
1
Incorrect answer with openai compatible penalty parameters

#238 Spycsh opened 1 month ago
1
requirements name - cabelo@opensuse.org

#237 cabelo closed 1 month ago
0
Remove References to torch compile mode in readme

#236 tthakkal closed 1 month ago
1
Enables Flash Attention in TGI for gemma models

#235 tthakkal closed 1 month ago
0
set ignore EOS by using TEXT_GENERATION_SERVER_IGNORE_EOS_TOKEN

#234 schoi-habana closed 1 month ago
1
Fix sysntax error in PR 232

#233 tthakkal closed 1 month ago
0
Enabling Flash Attention support for falcon model

#232 tthakkal closed 1 month ago
0
OH tag ci_11102024 keeping synapse 1.17

#231 schoi-habana closed 1 month ago
0
Upgrade Synapse and DS to 1.18

#230 schoi-habana closed 1 month ago
0
Remove all references to habana_quantization_toolkit for 1.18

#229 tthakkal closed 1 month ago
1
Fix gpt_bigcode/starcoderbase-3b accuracy issue

#228 schoi-habana closed 1 month ago
1
upgrade to SynapseAI 1.18

#227 yuanwu2017 closed 3 weeks ago
18
Enhancements to README

#226 MohitIntel closed 1 month ago
1
Upgrade to 2.3.1

#225 yuanwu2017 opened 2 months ago
3
Removed functions iterating over tensors from torch compilation process

#224 jczaja opened 2 months ago
0
Generation stopped too early without hitting stop condition

#223 minmin-intel opened 2 months ago
7
Upgrade to Optimum Habana v1.13.2

#222 regisss closed 2 months ago
0
Update README.md with changes related to LLava-next multi card support

#221 tthakkal closed 2 months ago
2
Llava-next: Added flash_attention_recompute option

#220 tthakkal closed 2 months ago
0
Only Apply the TP in language_model

#219 yuanwu2017 closed 2 months ago
17
llama3.1-70B-instruct 422 error Template error: unknown test: test iterable is unknown (in <string>:99)

#218 minmin-intel opened 2 months ago
2
Enable the AutoGPTQ

#217 yuanwu2017 closed 2 months ago
2
When running llama2 7b, inference some 2k length prompt concurrently will cause TGI service crash.

#216 yao531441 closed 8 hours ago
6
Downgrade sympy to match synapaseAI 1.18 base image

#215 tthakkal closed 2 months ago
2
Make prefill time of static benchmark correct

#214 schoi-habana closed 3 months ago
0
readme changes

#213 tthakkal closed 3 months ago
1
Updated docker image version to 2.0.4

#212 tthakkal closed 3 months ago
1
Do not merge: Release testing 2.0.4

#211 tthakkal closed 3 months ago
1
Add qwen2 fp8 quant support

#210 changwangss closed 3 months ago
1
llava-next Fp8

#209 yuanwu2017 closed 3 months ago
4
Upgrade SynapseAI version to 1.17.0

#208 yuanwu2017 closed 3 months ago
14
GPTQ uint4 quantization broken

#207 endomorphosis opened 3 months ago
2
Resolved CVEs

#206 ModiIntel opened 3 months ago
3
Make bf16 default for hpu

#205 abhilash1910 closed 3 months ago
0
Schoi/llama3.1 tokenizer

#204 endomorphosis closed 3 months ago
3
Enable quantization with INC

#203 tthakkal closed 3 months ago
3
Enabled fused_sdpa flash attention for starcoder2 model

#202 tthakkal closed 3 months ago
0
Undo disable of hpu graphs for starcoder

#201 vidyasiv closed 3 months ago
0
Updated Readme to use flash attention for llama

#200 tthakkal closed 3 months ago
1
Pad token handling for Llama3.1

#199 schoi-habana closed 3 months ago
6