Closed deepankarm closed 1 year ago
Patch coverage: 29.69%
and project coverage change: -3.44%
:warning:
Comparison is base (
d4343d0
) 77.14% compared to head (812667b
) 73.70%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Goals
The PR enables using an Executor as a sagemaker custom container for inference by implementing 2 endpoints-
POST /invocations
&GET /ping
and accepting aserve
command on port 8080.To separate sagemaker inference, I've added a
--provider [NONE|SAGEMAKER]
argument to the CLI which can be expanded later toazureml
,vertexai
etc. When provider is sagemaker, we start a custom fastapi app that implements theGET /ping
endpoint./invocations
route is already added to the Executor, we add it to the fastapi app./invocations
is not added and only one route is added to the Executor, we add an/invocations
route that points to the route the Executor implements.Sagemaker needs the image to be pushed to ECR. To use an already pushed Executor with sagemaker, one can do the following and push the image.
During run, sagemaker adds
serve
as a CMD to the container entrypoint. I haven't added any arguments to the jina cli for this asserve
is ignored (if we follow the above syntax in the entrypoint). While implementing other providers, I'll evaluate the need of theserve
command.End-to-end test
I've manually tested the Executor after pushing a model to S3, the Executor image to ECR & running real-time & serverless inference via sagemaker endpoints.
Pending
Support for batch-transform jobs is not added in this PR, will be followed in another PR.