Closed daavoo closed 10 months ago
I don't seem to have access:
$ python src/endpoint_prediction.py \
--img_path data/test_data/REGION_1-24_0_1024_0_1024.jpg \
--endpoint_name results-train-pool-segmentation-v0-1-0-dev
Traceback (most recent call last):
File "/Users/dave/Code/example-get-started-experiments/src/endpoint_prediction.py", line 53, in <module>
endpoint_prediction(args.img_path, args.endpoint_name, args.output_path)
File "/Users/dave/Code/example-get-started-experiments/src/endpoint_prediction.py", line 32, in endpoint_prediction
result = predictor.predict(img_bytes)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dave/micromamba/envs/example-get-started-experiments/lib/python3.11/site-packages/sagemaker/base_predictor.py", line 185, in predict
response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dave/micromamba/envs/example-get-started-experiments/lib/python3.11/site-packages/botocore/client.py", line 530, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dave/micromamba/envs/example-get-started-experiments/lib/python3.11/site-packages/botocore/client.py", line 964, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ExpiredTokenException) when calling the InvokeEndpoint operation: The security token included in the request is expired
(example-get-started-experiments) dave@davids-air:~/Code/example-get-started-experiments [main] 13:32:25
$ ~/sts.sh 363718
Configuring AWS with token 363718
(example-get-started-experiments) dave@davids-air:~/Code/example-get-started-experiments [main] 13:32:36
$ python src/endpoint_prediction.py \
--img_path data/test_data/REGION_1-24_0_1024_0_1024.jpg \
--endpoint_name results-train-pool-segmentation-v0-1-0-dev
Traceback (most recent call last):
File "/Users/dave/Code/example-get-started-experiments/src/endpoint_prediction.py", line 53, in <module>
endpoint_prediction(args.img_path, args.endpoint_name, args.output_path)
File "/Users/dave/Code/example-get-started-experiments/src/endpoint_prediction.py", line 32, in endpoint_prediction
result = predictor.predict(img_bytes)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dave/micromamba/envs/example-get-started-experiments/lib/python3.11/site-packages/sagemaker/base_predictor.py", line 185, in predict
response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dave/micromamba/envs/example-get-started-experiments/lib/python3.11/site-packages/botocore/client.py", line 530, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dave/micromamba/envs/example-get-started-experiments/lib/python3.11/site-packages/botocore/client.py", line 964, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when calling the InvokeEndpoint operation: User: arn:aws:iam::260760892802:user/dave is not authorized to perform: sagemaker:InvokeEndpoint on resource: arn:aws:sagemaker:us-east-2:260760892802:endpoint/results-train-pool-segmentation-v0-1-0-dev because no identity-based policy allows the sagemaker:InvokeEndpoint action
What is our plan here? Not a blocker, but do we want to work towards making it public?
I don't seem to have access:
Can you try with the Sandbox account?
Also, make sure you set us-east-2
as AWS region when querying
What is our plan here? Not a blocker, but do we want to work towards making it public?
I assume we don't want to make the actual endpoint public, but rather a simple UI that queries the endpoint. I was assuming that, for now, we would be using it for live demos and using the sandbox account.
Can you try with the Sandbox account? Also, make sure you set
us-east-2
as AWS region when querying
Thanks, that helped, but now I'm getting a timeout error:
$ python src/endpoint_prediction.py \
--img_path data/test_data/REGION_1-24_0_1024_0_1024.jpg \
--endpoint_name results-train-pool-segmentation-v0-1-0-dev
Traceback (most recent call last):
File "/Users/dave/Code/example-get-started-experiments/src/endpoint_prediction.py", line 53, in <module>
endpoint_prediction(args.img_path, args.endpoint_name, args.output_path)
File "/Users/dave/Code/example-get-started-experiments/src/endpoint_prediction.py", line 32, in endpoint_prediction
result = predictor.predict(img_bytes)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dave/micromamba/envs/example-get-started-experiments/lib/python3.11/site-packages/sagemaker/base_predictor.py", line 185, in predict
response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dave/micromamba/envs/example-get-started-experiments/lib/python3.11/site-packages/botocore/client.py", line 530, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dave/micromamba/envs/example-get-started-experiments/lib/python3.11/site-packages/botocore/client.py", line 964, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from model container. Review the latency metrics in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/results-train-pool-segmentation-v0-1-0-dev in account 342840881361 for more information.
I also see errors in the logs in https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logsV2:log-groups/log-group/$252Faws$252Fsagemaker$252FEndpoints$252Fresults-train-pool-segmentation-v0-1-0-dev/log-events/AllTraffic$252F44267aee8024d8ef1612febe258e9378-08a54f8ef3504be4b8e6e736d1e78a67.
Can you try with the Sandbox account? Also, make sure you set
us-east-2
as AWS region when queryingThanks, that helped, but now I'm getting a timeout error:
$ python src/endpoint_prediction.py \ --img_path data/test_data/REGION_1-24_0_1024_0_1024.jpg \ --endpoint_name results-train-pool-segmentation-v0-1-0-dev Traceback (most recent call last): File "/Users/dave/Code/example-get-started-experiments/src/endpoint_prediction.py", line 53, in <module> endpoint_prediction(args.img_path, args.endpoint_name, args.output_path) File "/Users/dave/Code/example-get-started-experiments/src/endpoint_prediction.py", line 32, in endpoint_prediction result = predictor.predict(img_bytes)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dave/micromamba/envs/example-get-started-experiments/lib/python3.11/site-packages/sagemaker/base_predictor.py", line 185, in predict response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dave/micromamba/envs/example-get-started-experiments/lib/python3.11/site-packages/botocore/client.py", line 530, in _api_call return self._make_api_call(operation_name, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dave/micromamba/envs/example-get-started-experiments/lib/python3.11/site-packages/botocore/client.py", line 964, in _make_api_call raise error_class(parsed_response, operation_name) botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from model container. Review the latency metrics in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/results-train-pool-segmentation-v0-1-0-dev in account 342840881361 for more information.
I also see errors in the logs in https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logsV2:log-groups/log-group/$252Faws$252Fsagemaker$252FEndpoints$252Fresults-train-pool-segmentation-v0-1-0-dev/log-events/AllTraffic$252F44267aee8024d8ef1612febe258e9378-08a54f8ef3504be4b8e6e736d1e78a67.
I will take a look tomorrow. I tested it with a different instance type and I assume that the current serverless configuration is too small
@daavoo Looks like this PR is close to getting merged. Since this uses one of our official demo repos, we could use this in the blog post instead of the demo-fashion-mnist that I have currently used. wdyt? I can try to replace the example snippets in the blog post to use your snippets. And you might wanna rewrite some of the text. We'll not have a web UI, but that should be ok.
Since this uses one of our official demo repos, we could use this in the blog post instead of the demo-fashion-mnist that I have currently used. wdyt?
Makes sense to me. I would perhaps also use the opportunity to cut the scope of the post a little by dropping DVC details in favor of pointers to the dvc get-started pages
Since this uses one of our official demo repos, we could use this in the blog post instead of the demo-fashion-mnist that I have currently used. wdyt?
Makes sense to me. I would perhaps also use the opportunity to cut the scope of the post a little by dropping DVC details in favor of pointers to the dvc get-started pages
Ok. I'll share an updated version of the blog post tomorrow. @shcheklein FYI since we were discussing this today morning.
Merging as the endpoint is now working. Don't hesitate to open followups
Agree with @tapadipti that it makes sense to have one endpoint per stage or per version. Otherwise, I think we kind of miss the point of the registry (you can deploy every update to a new endpoint without it). IMO one endpoint per stage makes the most sense to drive home the value of that field, and I think we should focus on this being a self-contained deployment (you can do deployment without needing a separate engineering team to pick up the new model endpoint).
Add Sagemaker deployment.
https://github.com/iterative/example-get-started-experiments/actions/workflows/deploy-model.yml
https://us-east-2.console.aws.amazon.com/sagemaker/home?region=us-east-2#/endpoints/results-train-pool-segmentation-v0-1-0-dev