triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.08k stars 1.45k forks source link

Unable to load/unload models through SageMaker + Triton #6990

Open jadhosn opened 6 months ago

jadhosn commented 6 months ago

Description sagemaker_server.cc exposes load/unload-ing models through an http post request to SageMaker. I'm unable to load or unload models through SageMaker for Triton. I'm currently testing locally, but eventually would want to run the same setup on a SageMaker endpoints.

I went with opening an issue on this repo, knowing that sagemaker_server.cc lives under this repo even though this is directly linked to SageMaker.

Triton Information 22.12-pyt-python-py3

Are you using the Triton container or did you build it yourself? Using a Triton container, with the necessary predefined environment variables for SageMaker, and exposing port 8080 for SageMaker server. We also set SAGEMAKER_MULTI_MODEL=true as required for listing models.

To Reproduce

tritonserver --log-verbose=true --allow-sagemaker=true --allow-grpc=false --allow-http=true --allow-metrics=true --model-control-mode=explicit --model-repository /opt/ml/models

Running the SageMaker test for Triton + SageMaker (this test) fails.

Addressing SageMaker directly through port 8080, according to this documentation, loading a model is expecting the following body:

{
  "model_name": "my-model",
  "url": "/opt/ml/models/<my-hashed-name>/model"
}
POST http://localhost:8080/models/
Content-Type: application/json
{
  "model_name": "my-model",
  "url": "/opt/ml/models/<my-hashed-name>/model"
}

Returns the following error:

{"error":"failed to register '', repository not found"}

Expected behavior Loading/unloading models through SageMaker as expected

lkomali commented 6 months ago

cc @dyastremsky @rmccorm4

nikhil-sk commented 6 months ago

@jadhosn

Could you share more around the objective you are trying to acheive. And also the exact failure you are seeing?

Note that in MME mode, SageMaker will handle model loading and unloading on behalf of the customer and customer can only chose to create, invoke or delete the endpoint. Model is loaded on-demand when endpoint is invoked with TargetModel=xyz.tar.gz.

jadhosn commented 6 months ago

Could you share more around the objective you are trying to acheive. And also the exact failure you are seeing?

@nskool, I'm looking for the ability to unload models from the gpu memory on command on Sagemaker. Trition already supports loading and unloading models through POST requests. For clarity, I'm not referring to deleting the downloaded model from the temporary storage, I wish to explicitly unload a model from GPU memory, through SageMaker

inf3rnus commented 5 months ago

@jadhosn I've got the solution

Run docker like so:

docker run --rm --net=host -v ${PWD}/repos:/repos triton_server_aws_cpu:0.0.1 tritonserver --log-verbose=true --allow-sagemaker=true --http-port=23000 --allow-grpc=false --allow-http=true --allow-http=true --allow-metrics=true --model-control-mode=explicit --model-repository /tmp

Note, --model-repository is required, so I just set it to /tmp, in the docker/sagemaker/serve script, they're setting it to some kind of dummy path.

Second note, you want to map your local repos directory to the docker container, which is what -v ${PWD}/repos:/repos is for, pay attention to this when looking at the JSON for the POST request below.

Looks like the way Sagemaker Triton works is a new model repository is used for each model.

So, the way to then load your models would be to e.g

POST http://localhost:8080/models

JSON:

{
    "model_name": "gustavosta-magicprompt-stable-diffusion",
    "url": "/repos/gustavosta-magicprompt-stable-diffusion"
}

Your local repos folder then looks something like this:

image

Checkout docker/sagemaker/serve and src/sagemaker_server.cc if you need to reverse engineer anything further

FYI, the HTTP server is not required, I have that there because I was doing some experiments.