Open Bryson14 opened 7 months ago
Voting for Prioritization
Volunteering to Work on This Issue
Hey @Bryson14 👋 Thank you for taking the time to raise this! As a heads up, we consider adding additional arguments to existing resources to be an enhancement, so I've updated the labels with that in mind.
Hi @Bryson14 , Are you able to provide the ECR image you have used for sagemaker_mistral_public_image. Also, if you can provide a working example either in CLI or anywhere else would great. Most models I have tried do not support managed instance scaling, so its blocking me from writing a test case to enable this feature.
I have used example here - https://repost.aws/questions/QUODaQEyKNTbqWLYszAIYCIg/creating-jumpstart-sagemaker-endpoint-with-terraform-fails-with-model-needs-flash-attention
endpoint_configuration_test.go:162: Step 1/3 error: Error running apply: exit status 1
Error: creating SageMaker Endpoint Configuration: ValidationException: ManagedInstanceScaling is not supported with the given EndpointConfig setup.
status code: 400, request id: 32f1694c-6389-43f9-9bea-5245a1497bfd
with aws_sagemaker_endpoint_configuration.test,
on terraform_plugin_test.tf line 54, in resource "aws_sagemaker_endpoint_configuration" "test":
54: resource "aws_sagemaker_endpoint_configuration" "test" {
Isn't enabling the network isolation done in the SageMaker model and not the endpoint config?
Yes, when a model is specified in endpoint config VPC/subnet details and network isolation cannot be specified and is mutually.exclusive. The endpoint config inherits VPC config and network isolation from model definition.
Not sure if it helps, but at least it may help someone who stumbles upon this issue later. To answer the issue mentioned above of "the values are these jumpstart images and s3 locations are not published". I was able to retrieve these programmatically like this:
(vs-code jupyter notebook script formatting)
# %%
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
from sagemaker import image_uris, model_uris
# %%
region = "us-west-2" # Your region.
instance_type = "ml.g5.2xlarge" # Your desired instance type. Note image will be different for gpu vs cpu instances.
# %%
# find model_id for a given search string
[m for m in list_jumpstart_models(region=region) if "mistral" in m]
# %%
model_id = "huggingface-llm-mistral-7b-instruct"
# %%
# find latest version of model_id
[m for m in list_jumpstart_models(filter=f"model_id=={model_id}", list_versions=True, region=region)]
# %%
model_version = "3.1.0"
# %%
image_uris.retrieve(framework=None, instance_type=instance_type, image_scope="inference", model_id=model_id, model_version=model_version, region=region)
# %%
model_uris.retrieve(instance_type=instance_type, model_scope="inference", model_id=model_id, model_version=model_version, region=region)
Terraform Core Version
1.6.5
AWS Provider Version
5.31.0
Affected Resource(s)
Sagemaker Engpoint config.
Expected Behavior
When creating a jumpstart endpoint through the SageMaker studio, you can create a LLM (like mistral) on an managed endpoint. There are few hacks you have to do to get this to work with Terraform because the values are these jumpstart images and s3 locations are not published. But by deploying a model on studio, then using
aws cli
to get the model'sprimary_container.environment
andmodel_data_source
, terraform can copy it.The issue is that the
aws_sagemaker_endpoint_configuration
cannot support the configuration that sagemaker studio creates by default.Here is the described endpoint configuration made by studio:
Actual Behavior
With terraform, it is not possible to specify
ManagedInstanceScaling
:It is also not possible to specify
NetworkIsolation
This is the endpoint configuration created by terraform
Relevant Error/Panic Output Snippet
No response
Terraform Configuration Files
Steps to Reproduce
run standard terraform
init
,plan
, andapply
and check the comparison between the endpoint configurations deployed by terraform and SageMaker studio UI.Debug Output
No response
Panic Output
No response
Important Factoids
No response
References
No response
Would you like to implement a fix?
None