huggingface / tgi-gaudi

Large Language Model Text Generation Inference on Habana Gaudi
http://hf.co/docs/text-generation-inference
Apache License 2.0
28 stars 47 forks source link

Enabling Flash Attention support for falcon model #232

Closed tthakkal closed 1 month ago

tthakkal commented 1 month ago

What does this PR do?

Enables Flash Attention in TGI for falcon models

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.