LLM SageMaker Module

Terraform module for easily deploy open LLMs from Hugging Face to Amazon SageMaker real-time endpoints. This module will create all the necessary resources to deploy a model to Amazon SageMaker including IAM roles, if not provided, SageMaker Model, SageMaker Endpoint Configuration, SageMaker endpoint.

With this module you can deploy Llama 3, Mistral, Mixtral, Command and many more models from Hugging Face to Amazon SageMaker.

Usage

basic example

module "sagemaker-huggingface" {
  source               = "philschmid/llm-sagemaker/aws"
  version              = "0.1.0"
  endpoint_name_prefix = "llama3"
  hf_model_id          = "meta-llama/Meta-Llama-3.1-8B-Instruct"
  hf_token             = "YOUR_HF_TOKEN_WITH_ACCESS_TO_THE_MODEL"
  instance_type        = "ml.g5.2xlarge"
  instance_count       = 1 # default is 1

  tgi_config = {
    max_input_tokens       = 4000
    max_total_tokens       = 4096
    max_batch_total_tokens = 6144
  }
}

examples:

Basic Example deploy Llama 3

Run Tests

AWS_PROFILE=hf-sm AWS_DEFAULT_REGION=us-east-1 go test -v

License

MIT License. See LICENSE for full details.

Requirements

Name	Version
aws	5.60.0
random	3.1.0

Providers

Name	Version
aws	5.60.0
random	3.1.0

Modules

No modules.

Resources

Name	Type
aws_appautoscaling_policy.sagemaker_policy	resource
aws_appautoscaling_target.sagemaker_target	resource
aws_iam_role.new_role	resource
aws_sagemaker_endpoint.llm	resource
aws_sagemaker_endpoint_configuration.llm	resource
aws_sagemaker_model.huggingface_hub_model	resource
random_string.suffix	resource
aws_iam_role.get_role	data source
aws_region.current	data source

Inputs

Name	Description	Type	Default	Required
autoscaling	A Object which defines the autoscaling target and policy for our SageMaker Endpoint. Required keys are `max_capacity` and `scaling_target_invocations`	object({ min_capacity = optional(number), max_capacity = number, scaling_target_invocations = optional(number), scale_in_cooldown = optional(number), scale_out_cooldown = optional(number), })	{ "max_capacity": null, "min_capacity": 1, "scale_in_cooldown": 300, "scale_out_cooldown": 66, "scaling_target_invocations": null }	no
endpoint_name_prefix	Prefix for the name of the SageMaker endpoint	`string`	n/a	yes
hf_model_id	The Hugging Face model ID to deploy	`string`	n/a	yes
hf_token	The Hugging Face API token	`string`	`null`	no
instance_count	The initial number of instances to run in the Endpoint created from this Model. Defaults to 1.	`number`	`1`	no
instance_type	The EC2 instance type to deploy this Model to. For example, `ml.g5.xlarge`.	`string`	`null`	no
llm_container	URI of the Docker image containing the model	`string`	`null`	no
sagemaker_execution_role	An AWS IAM role Name to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role if it needs to access some AWS resources. If not specified, the role will created with with the `CreateModel` permissions from the documentation	`string`	`null`	no
tags	A map of tags (key-value pairs) passed to resources.	`map(string)`	`{}`	no
tgi_config	The configuration for the TGI model	object({ max_input_tokens = number max_total_tokens = number max_batch_total_tokens = number })	{ "max_batch_total_tokens": 8192, "max_input_tokens": 2048, "max_total_tokens": 4096 }	no

Outputs

Name	Description
container	n/a
iam_role	IAM role used in the endpoint
sagemaker_endpoint	created Amazon SageMaker endpoint resource
sagemaker_endpoint_configuration	created Amazon SageMaker endpoint configuration resource
sagemaker_endpoint_name	Name of the created Amazon SageMaker endpoint, used for invoking the endpoint, with sdks
sagemaker_model	created Amazon SageMaker model resource
tags	n/a

philschmid / terraform-aws-llm-sagemaker

readme