philschmid / terraform-aws-llm-sagemaker

MIT License
15 stars 5 forks source link

LLM SageMaker Module

Terraform module for easily deploy open LLMs from Hugging Face to Amazon SageMaker real-time endpoints. This module will create all the necessary resources to deploy a model to Amazon SageMaker including IAM roles, if not provided, SageMaker Model, SageMaker Endpoint Configuration, SageMaker endpoint.

With this module you can deploy Llama 3, Mistral, Mixtral, Command and many more models from Hugging Face to Amazon SageMaker.

Usage

basic example

module "sagemaker-huggingface" {
  source               = "philschmid/llm-sagemaker/aws"
  version              = "0.1.0"
  endpoint_name_prefix = "llama3"
  hf_model_id          = "meta-llama/Meta-Llama-3.1-8B-Instruct"
  hf_token             = "YOUR_HF_TOKEN_WITH_ACCESS_TO_THE_MODEL"
  instance_type        = "ml.g5.2xlarge"
  instance_count       = 1 # default is 1

  tgi_config = {
    max_input_tokens       = 4000
    max_total_tokens       = 4096
    max_batch_total_tokens = 6144
  }
}

examples:

Run Tests

AWS_PROFILE=hf-sm AWS_DEFAULT_REGION=us-east-1 go test -v

License

MIT License. See LICENSE for full details.

Requirements

Name Version
aws 5.60.0
random 3.1.0

Providers

Name Version
aws 5.60.0
random 3.1.0

Modules

No modules.

Resources

Name Type
aws_appautoscaling_policy.sagemaker_policy resource
aws_appautoscaling_target.sagemaker_target resource
aws_iam_role.new_role resource
aws_sagemaker_endpoint.llm resource
aws_sagemaker_endpoint_configuration.llm resource
aws_sagemaker_model.huggingface_hub_model resource
random_string.suffix resource
aws_iam_role.get_role data source
aws_region.current data source

Inputs

Name Description Type Default Required
autoscaling A Object which defines the autoscaling target and policy for our SageMaker Endpoint. Required keys are max_capacity and scaling_target_invocations
object({
min_capacity = optional(number),
max_capacity = number,
scaling_target_invocations = optional(number),
scale_in_cooldown = optional(number),
scale_out_cooldown = optional(number),
})
{
"max_capacity": null,
"min_capacity": 1,
"scale_in_cooldown": 300,
"scale_out_cooldown": 66,
"scaling_target_invocations": null
}
no
endpoint_name_prefix Prefix for the name of the SageMaker endpoint string n/a yes
hf_model_id The Hugging Face model ID to deploy string n/a yes
hf_token The Hugging Face API token string null no
instance_count The initial number of instances to run in the Endpoint created from this Model. Defaults to 1. number 1 no
instance_type The EC2 instance type to deploy this Model to. For example, ml.g5.xlarge. string null no
llm_container URI of the Docker image containing the model string null no
sagemaker_execution_role An AWS IAM role Name to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role if it needs to access some AWS resources. If not specified, the role will created with with the CreateModel permissions from the documentation string null no
tags A map of tags (key-value pairs) passed to resources. map(string) {} no
tgi_config The configuration for the TGI model
object({
max_input_tokens = number
max_total_tokens = number
max_batch_total_tokens = number
})
{
"max_batch_total_tokens": 8192,
"max_input_tokens": 2048,
"max_total_tokens": 4096
}
no

Outputs

Name Description
container n/a
iam_role IAM role used in the endpoint
sagemaker_endpoint created Amazon SageMaker endpoint resource
sagemaker_endpoint_configuration created Amazon SageMaker endpoint configuration resource
sagemaker_endpoint_name Name of the created Amazon SageMaker endpoint, used for invoking the endpoint, with sdks
sagemaker_model created Amazon SageMaker model resource
tags n/a