huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.36k stars 948 forks source link

AWS Inferentia (inf1, inf2) support #688

Closed OrigamiDream closed 2 months ago

OrigamiDream commented 11 months ago

Feature request

I can't find any guidance on integrating HuggingFace TGI and AWS Inferentia. I've found several documents about deployment guides for individual end-to-end models, but I don't see them for these autoregressive models like CausalLM.

Therefore, I would like to request a feature to support for AWS Inferentia.

Motivation

SageMaker is expensive and rigid, unlike Serverless. Support for inf1 and inf2 instances would reduce the cost of cloud computing.

Your contribution

N/A

Narsil commented 11 months ago

Thanks, we currently don't support it, because to the best of my knowledge there is no flash attention on inferentia, which is an important piece of TGI.

We have started some work internally for specialized hardware, but it's a sizeable amount of work.

philschmid commented 11 months ago

The biggest challenge with inferentia is the missing support of dynamic shapes.

nikitajz commented 10 months ago

This seems an AWS Neuron issue to track dynamic shapes support: https://github.com/aws-neuron/aws-neuron-sdk/issues/564

muhammad-asn commented 7 months ago

https://github.com/huggingface/optimum-neuron/tree/main/text-generation-inference Another tools from hugging face

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.