philschmid / llm-sagemaker-sample

Apache License 2.0
49 stars 22 forks source link

notebooks/deploy-mixtral.ipynb issue #8

Open existme opened 11 months ago

existme commented 11 months ago

This is not really an issue, but I couldn't find any other way to contact you. I was trying to follow your instructions on https://www.philschmid.de/sagemaker-deploy-mixtral and ended up in this repository.

I tried to follow the deployment instructions, but the deployment was not successful. I got the following error logs on the inference endpoint:

2023-12-15T20:06:10.216+01:00   > File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 161, in serve_inner model = get_model( File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 310, in get_model return FlashMixtral( File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mixtral.py", line 21, in __init__ super(FlashMixtral, self).__init__( File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mistral.py", line 318, in __init__ SLIDING_WINDOW_BLOCKS = math.ceil(config.sliding_window / BLOCK_SIZE)
2023-12-15T20:06:10.216+01:00   TypeError: unsupported operand type(s) for /: 'NoneType' and 'int' 

The HF image that I ended up using was 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.1-gpu-py310-cu121-ubuntu20.04

looking into TGI issues, and found this thread. It seems to be fixed by a commit mentioned in the thread. But I don't how can I get the latest DLC image of 1.3.3 for a sagemaker deployment, because when I specify the version in image_uris.retrieve or in get_huggingface_llm_image_uri, it complains:

ValueError: Unsupported huggingface-llm version: 1.3.3. You may need to upgrade your SDK version (pip install -U sagemaker) for newer huggingface-llm versions. Supported huggingface-llm version(s): 0.6.0, 0.8.2, 0.9.3, 1.0.3, 1.1.0, 1.2.0, 1.3.1, 0.6, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3. 

I don't know the procedure for having the latest version ending up in aws-dkr or how we can use a custom-built DLC image when deploying to Sagemaker. Can you help in any way, or can you explain how your deployment works?

Thanks in advance

LvffY commented 11 months ago

I just opened an issue on sagemaker itself because I think it's an issue with the sagemaker SDK that's limiting some versions.

existme commented 11 months ago

Thank you for taking the time to create the issue :pray: I hope it gets the needed attention.

existme commented 11 months ago

@LvffY, by the way, do you know any other way of deploying the model as an inference? I want to try the model on AWS, but so far, I found no way to do that.

sdkramer10 commented 11 months ago

Thanks for adding the ticket! I am also blocked by this issue.

LvffY commented 11 months ago

@LvffY, by the way, do you know any other way of deploying the model as an inference? I want to try the model on AWS, but so far, I found no way to do that.

@existme not at the time

rhoentier commented 10 months ago

I came with the same problem.

Huggingface has released a newer version of the image which is accessible via sagemaker: 763104351884.dkr.ecr.eu-central-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.3-gpu-py310-cu121-ubuntu20.04-v1.0