Closed OrigamiDream closed 2 months ago
Thanks, we currently don't support it, because to the best of my knowledge there is no flash attention on inferentia, which is an important piece of TGI.
We have started some work internally for specialized hardware, but it's a sizeable amount of work.
The biggest challenge with inferentia is the missing support of dynamic shapes.
This seems an AWS Neuron issue to track dynamic shapes support: https://github.com/aws-neuron/aws-neuron-sdk/issues/564
https://github.com/huggingface/optimum-neuron/tree/main/text-generation-inference Another tools from hugging face
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Feature request
I can't find any guidance on integrating HuggingFace TGI and AWS Inferentia. I've found several documents about deployment guides for individual end-to-end models, but I don't see them for these autoregressive models like CausalLM.
Therefore, I would like to request a feature to support for AWS Inferentia.
Motivation
SageMaker is expensive and rigid, unlike Serverless. Support for inf1 and inf2 instances would reduce the cost of cloud computing.
Your contribution
N/A