huggingface / tgi-gaudi

Large Language Model Text Generation Inference on Habana Gaudi
http://hf.co/docs/text-generation-inference
Apache License 2.0
28 stars 47 forks source link

Enabled fused_sdpa flash attention for starcoder2 model #202

Closed tthakkal closed 3 months ago

tthakkal commented 3 months ago

What does this PR do?

Enabled fused_sdpa flash attention for starcoder2 model

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.