Terraform module for easily deploy open LLMs from Hugging Face to Amazon SageMaker real-time endpoints. This module will create all the necessary resources to deploy a model to Amazon SageMaker including IAM roles, if not provided, SageMaker Model, SageMaker Endpoint Configuration, SageMaker endpoint.
With this module you can deploy Llama 3, Mistral, Mixtral, Command and many more models from Hugging Face to Amazon SageMaker.
basic example
module "sagemaker-huggingface" {
source = "philschmid/llm-sagemaker/aws"
version = "0.1.0"
endpoint_name_prefix = "llama3"
hf_model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
hf_token = "YOUR_HF_TOKEN_WITH_ACCESS_TO_THE_MODEL"
instance_type = "ml.g5.2xlarge"
instance_count = 1 # default is 1
tgi_config = {
max_input_tokens = 4000
max_total_tokens = 4096
max_batch_total_tokens = 6144
}
}
examples:
AWS_PROFILE=hf-sm AWS_DEFAULT_REGION=us-east-1 go test -v
MIT License. See LICENSE for full details.
Name | Version |
---|---|
aws | 5.60.0 |
random | 3.1.0 |
Name | Version |
---|---|
aws | 5.60.0 |
random | 3.1.0 |
No modules.
Name | Type |
---|---|
aws_appautoscaling_policy.sagemaker_policy | resource |
aws_appautoscaling_target.sagemaker_target | resource |
aws_iam_role.new_role | resource |
aws_sagemaker_endpoint.llm | resource |
aws_sagemaker_endpoint_configuration.llm | resource |
aws_sagemaker_model.huggingface_hub_model | resource |
random_string.suffix | resource |
aws_iam_role.get_role | data source |
aws_region.current | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
autoscaling | A Object which defines the autoscaling target and policy for our SageMaker Endpoint. Required keys are max_capacity and scaling_target_invocations |
object({ |
{ |
no |
endpoint_name_prefix | Prefix for the name of the SageMaker endpoint | string |
n/a | yes |
hf_model_id | The Hugging Face model ID to deploy | string |
n/a | yes |
hf_token | The Hugging Face API token | string |
null |
no |
instance_count | The initial number of instances to run in the Endpoint created from this Model. Defaults to 1. | number |
1 |
no |
instance_type | The EC2 instance type to deploy this Model to. For example, ml.g5.xlarge . |
string |
null |
no |
llm_container | URI of the Docker image containing the model | string |
null |
no |
sagemaker_execution_role | An AWS IAM role Name to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role if it needs to access some AWS resources. If not specified, the role will created with with the CreateModel permissions from the documentation |
string |
null |
no |
tags | A map of tags (key-value pairs) passed to resources. | map(string) |
{} |
no |
tgi_config | The configuration for the TGI model | object({ |
{ |
no |
Name | Description |
---|---|
container | n/a |
iam_role | IAM role used in the endpoint |
sagemaker_endpoint | created Amazon SageMaker endpoint resource |
sagemaker_endpoint_configuration | created Amazon SageMaker endpoint configuration resource |
sagemaker_endpoint_name | Name of the created Amazon SageMaker endpoint, used for invoking the endpoint, with sdks |
sagemaker_model | created Amazon SageMaker model resource |
tags | n/a |