Error running trained model on sagemaker endpoint

sdkramer10 commented 1 year ago

I followed the instructions for training, and it works great. However, I now want to deploy my fine-tuned model to a sagemaker endpoint, but I get the following error when I run predict() on the endpoint.

"Could not load model /opt/ml/model with any of the following classes: (\u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForCausalLM\u0027\u003e, \u003cclass \u0027transformers.models.llama.modeling_llama.LlamaForCausalLM\u0027\u003e)."

Here is my inference code:

llm_model = HuggingFaceModel(
  role=role,
  transformers_version = '4.28',   
  pytorch_version      = '2.0',            
  py_version           = 'py310',   
  sagemaker_session    = sess,
  model_data           = <path to the model.tar.gz>,
)

llm = llm_model.deploy(
  initial_instance_count=1,
  instance_type="ml.g5.12xlarge",
)

response = llm.predict(prompt)

philschmid commented 1 year ago

You have to use the LLM container see here for an example with falcon: https://www.philschmid.de/sagemaker-llm-vpc

sdkramer10 commented 1 year ago

You have to use the LLM container see here for an example with falcon: https://www.philschmid.de/sagemaker-llm-vpc

I tried that first, and it didn't work either. Using the LLM container (version 0.8.2) results in this error in the endpoint:

FileNotFoundError: No local weights found in /opt/ml/model with extension .bin

sdkramer10 commented 1 year ago

It seems like it's the same issue these people are seeing with Falcon - https://discuss.huggingface.co/t/qlora-trained-llama2-13b-deployment-error-on-sagemaker-using-text-generation-inference-image/48154

philschmid / sagemaker-huggingface-llama-2-samples

Error running trained model on sagemaker endpoint #9