Open sdkramer10 opened 1 year ago
You have to use the LLM container see here for an example with falcon: https://www.philschmid.de/sagemaker-llm-vpc
You have to use the LLM container see here for an example with falcon: https://www.philschmid.de/sagemaker-llm-vpc
I tried that first, and it didn't work either. Using the LLM container (version 0.8.2) results in this error in the endpoint:
FileNotFoundError: No local weights found in /opt/ml/model with extension .bin
It seems like it's the same issue these people are seeing with Falcon - https://discuss.huggingface.co/t/qlora-trained-llama2-13b-deployment-error-on-sagemaker-using-text-generation-inference-image/48154
I followed the instructions for training, and it works great. However, I now want to deploy my fine-tuned model to a sagemaker endpoint, but I get the following error when I run predict() on the endpoint.
Here is my inference code: