Closed abhimasand closed 11 months ago
Seems like you are missing permisisons.
I am using an admin role, and I have checked that it has all the s3 and Sagemaker permissions required. However, I will double-check that.
The part I am confused about is how it deployed the first time right after training? If it was a permission issue, it shouldn't have deployed at that time either.
your error says role "" exists
with an empty string maybe its not passed correctly
I apologize for the confusion. I had omitted some details from the model uri before posting.
I have solved the problem now. I realized that there was a confusing cell in the notebook.
There was a cell that had this code:
huggingface_estimator.model_data["S3DataSource"]["S3Uri"].replace("s3://", "https://s3.console.aws.amazon.com/s3/buckets/")
But this code does not change the original value of huggingface_estimator.model_data["S3DataSource"]["S3Uri"]. It only returns a new string. I was wrong to think that this code would modify the value.
The correct way was to just use the s3 uri with the prefix "s3://" as the model_s3_path instead of the URL.
Hi @philschmid,
Thanks for making this repo, it was a huge help! I successfully trained and deployed the model to a sagemaker endpoint. However, when I deleted the endpoint when I was done with it and wanted to recreate it again, I could not do so.
For context, I manually retrieved the s3 url of my model and put it in the model s3 path.
I am unable to figure out why I am not able to deploy the model even though the s3 path is pointing to the correct location and my role has all the required permissions.
I get the following error:
ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Could not access model data at/huggingface-qlora-mistralai-Mistral-7B--2023-10-06-11-27-09-016/output/model/. Please ensure that the role "" exists and that its trust relationship policy allows the action "sts:AssumeRole" for the service principal "sagemaker.amazonaws.com". Also ensure that the role has "s3:GetObject" permissions and that the object is located in eu-west-1. If your Model uses multiple models or uncompressed models, please ensure that the role has "s3:ListBucket" permission.
Truly would appreciate your help!