Error when deploying mixtral

rhoentier commented 5 months ago

I get a very non-specific error when deploying mixtral to sagemaker:

Traceback (most recent call last): File "XXX", line 47, in <module> huggingface_model.deploy( File "XXX", line 315, in deploy return super(HuggingFaceModel, self).deploy( File "/XXX", line 1654, in deploy self.sagemaker_session.endpoint_from_production_variants( File "/XXX", line 5380, in endpoint_from_production_variants return self.create_endpoint( File "XXX", line 4291, in create_endpoint self.wait_for_endpoint(endpoint_name, live_logging=live_logging) File "XXX", line 5023, in wait_for_endpoint raise exceptions.UnexpectedStatusException( sagemaker.exceptions.UnexpectedStatusException: Error hosting endpoint XXX: Failed. Reason: Request to service failed. If failure persists after retry, contact customer support.. Try changing the instance type or reference the troubleshooting page https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-troubleshooting.html

AWS has not created a log group in cloudwatch at this time.

Is anyone else experiencing the same problem?

philschmid commented 5 months ago

Can you please shar ethe code you used to deploy?

rhoentier commented 5 months ago

from sagemaker.huggingface import HuggingFaceModel

role = "XXX"
model_s3_location = "XXX"
instance_type = "ml.g5.48xlarge"
number_of_gpu = 8
health_check_timeout = 300

config = {
    "SM_NUM_GPUS": json.dumps(number_of_gpu), 
}

huggingface_model = HuggingFaceModel(
    model_data=model_s3_location,
    image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.3-gpu"
              "-py310-cu121-ubuntu20.04-v1.0",
    role=role,
    env=config
)

llm = huggingface_model.deploy(
    endpoint_name=endpoint_name,
    initial_instance_count=1,
    instance_type=instance_type,
    container_startup_health_check_timeout=health_check_timeout,
)

philschmid commented 5 months ago

Can you please check here https://www.philschmid.de/sagemaker-mistral#4-deploy-fine-tuned-mistral-7b-on-amazon-sagemaker You need to define more variables in the config for TGI to know here your model is stored.

rhoentier commented 5 months ago

My model is inside a tar file. Do I need to define the variable as well? Various examples from huggingface or aws only use model data.

philschmid commented 5 months ago

The blog post describes your use cases.

rhoentier commented 5 months ago

Thanks for your help! We needed the HF_MODEL_ID param. We also had the problem that we needed to define the model_data_download_timeout param in the deploy function.

philschmid / llm-sagemaker-sample

Error when deploying mixtral #12