Closed rhoentier closed 5 months ago
Can you please shar ethe code you used to deploy?
from sagemaker.huggingface import HuggingFaceModel
role = "XXX"
model_s3_location = "XXX"
instance_type = "ml.g5.48xlarge"
number_of_gpu = 8
health_check_timeout = 300
config = {
"SM_NUM_GPUS": json.dumps(number_of_gpu),
}
huggingface_model = HuggingFaceModel(
model_data=model_s3_location,
image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.3-gpu"
"-py310-cu121-ubuntu20.04-v1.0",
role=role,
env=config
)
llm = huggingface_model.deploy(
endpoint_name=endpoint_name,
initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=health_check_timeout,
)
Can you please check here https://www.philschmid.de/sagemaker-mistral#4-deploy-fine-tuned-mistral-7b-on-amazon-sagemaker
You need to define more variables in the config
for TGI to know here your model is stored.
My model is inside a tar file. Do I need to define the variable as well? Various examples from huggingface or aws only use model data.
The blog post describes your use cases.
Thanks for your help! We needed the HF_MODEL_ID param. We also had the problem that we needed to define the model_data_download_timeout param in the deploy function.
I get a very non-specific error when deploying mixtral to sagemaker:
Traceback (most recent call last): File "XXX", line 47, in <module> huggingface_model.deploy( File "XXX", line 315, in deploy return super(HuggingFaceModel, self).deploy( File "/XXX", line 1654, in deploy self.sagemaker_session.endpoint_from_production_variants( File "/XXX", line 5380, in endpoint_from_production_variants return self.create_endpoint( File "XXX", line 4291, in create_endpoint self.wait_for_endpoint(endpoint_name, live_logging=live_logging) File "XXX", line 5023, in wait_for_endpoint raise exceptions.UnexpectedStatusException( sagemaker.exceptions.UnexpectedStatusException: Error hosting endpoint XXX: Failed. Reason: Request to service failed. If failure persists after retry, contact customer support.. Try changing the instance type or reference the troubleshooting page https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-troubleshooting.html
AWS has not created a log group in cloudwatch at this time.
Is anyone else experiencing the same problem?