Closed sricke closed 3 months ago
Hi @sricke could you share a guide how the container is deployed inside Vertex AI?
Sure. Hope this helps.
This guide looks into uploading a custom docker container to create a Vertex AI Model instance
This guide looks into serving previously created model to a Vertex AI Endpoint
This guide looks into Serving Predictions with NVIDIA Triton
So the steps we're taking are:
1. Create pytriton docker image and push to Artifact Registry:
Dockerfile:
FROM nvcr.io/nvidia/pytorch:23.10-py3
# Requirements are installed here to ensure they will be cached.
COPY requirements.txt requirements.txt
# Set environment variables
ENV PYTHONUNBUFFERED=0
ENV MODEL_NAME=${MODEL_NAME}
ENV TRITON_PORT=${TRITON_PORT}
RUN pip install -r requirements.txt
COPY model.py /home/app/src/model.py
COPY server.py /home/app/src/server.py
WORKDIR /home/app/
CMD python3 src/server.py --vertex
# build docker image
docker build -t pytriton (path to dockerfile folder)
# configure authentification to artifact registry repo
gcloud auth configure-docker $REGION-docker.pkg.dev --quiet
IMAGE_URI=$REGION-docker.pkg.dev/$PROJECT_ID/$DOCKER_ARTIFACT_REPO/pytriton
# Tag and upload model docker image
docker tag pytriton $IMAGE_URI
docker push $IMAGE_URI
2. Create Vertex AI Model Instance, using previously created Artifact Registry image.
3. Create Vertex AI Model endpoint, using previously created Model instance.
Thanks @sricke! Let us review that and get back to you.
Hi @sricke. The PyTriton path for deploying model would be using a custom container described here: https://cloud.google.com/vertex-ai/docs/predictions/use-custom-container
Could you also remove any flag related with VertexAI in TritonConfig and provide the execution log? This should not be necessary and even cause a problems like this error:
failed to start Vertex AI service: Invalid argument - Expect the model repository contains only a single model if default model is not specified
I believe the model file you are providing from Cloud storage is read inside the model.py
? The model_repository
required for pure Triton based deployment is not needed here.
@jkosek the routes I'm using for Vertex predict is /v2/models/StableDiffusion_Img2Img/infer
and for health checkis /v2/health/live
.
Yes, I have a model.py
file that basically loads a StableDiffusion Pipeline using model files and weights loaded from Cloud storage. We've also tried using a standard Yolo model ending up with same errors.
If I remove the VertexAI flags in TritonConfig the model is loaded correctly, but VertexAI sends a series of health checks that return error 400 and eventually shuts down the server. Attaching logs:
I1117 19:51:48.680199 138 vertex_ai_server.cc:350] Started Vertex AI HTTPService at 0.0.0.0:8015
I1117 19:51:48.721858 138 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002
I1117 19:51:49.489155 138 vertex_ai_server.cc:108] Vertex AI request: 0 /v2/health/ready
I1117 19:51:49.489196 138 vertex_ai_server.cc:227] Vertex AI error: 0 /v2/health/ready - 400
I1117 19:51:49.489566 138 vertex_ai_server.cc:108] Vertex AI request: 0 /v2/health/ready
I1117 19:51:49.489584 138 vertex_ai_server.cc:227] Vertex AI error: 0 /v2/health/ready - 400
...
This is repeated for 2 minutes
...
I1117 19:53:44.693555 138 vertex_ai_server.cc:108] Vertex AI request: 0 /v2/health/ready
I1117 19:53:44.693587 138 vertex_ai_server.cc:227] Vertex AI error: 0 /v2/health/ready - 400
2023-11-17 19:53:46,696 - DEBUG - pytriton.triton: Stopping Triton Inference server and proxy backends
2023-11-17 19:53:46,696 - DEBUG - pytriton.server.triton_server: Stopping Triton Inference server - sending SIGINT signal and wait 30s
2023-11-17 19:53:46,696 - DEBUG - pytriton.server.triton_server: Waiting for process to stop
2023-11-17 19:53:49,766 - DEBUG - pytriton.server.triton_server: Triton Inference Server stopped
...
2023-11-17 16:53:50.036 raise PyTritonClientTimeoutError("Waiting for server to be ready timed out.")
2023-11-17 16:53:50.036 pytriton.client.exceptions.PyTritonClientTimeoutError: Waiting for server to be ready timed out.
@sricke thanks for information and patience. I was able to reproduce locally the first reported error:
failed to start Vertex AI service: Invalid argument - Expect the model repository contains only a single model if default model is not specified
The issue we are seeing might be caused by some internal behavior of PyTriton in version >=0.4.0.
Please, could you try use PyTriton 0.3.1:
pip install "nvidia-pytriton==0.3.1"
And following TritonConfig:
TritonConfig(exit_on_error=True, log_verbose=log_verbose, allow_vertex_ai=True, vertex_ai_port=8015)
Let me know if that helped.
@sricke I was wondering if the suggestions you received were helpful?
@jkosek I tried the configuration you mentioned and while the model loads correctly it eventually raises an error.
Attaching logs:
I1130 19:14:31.664455 143 vertex_ai_server.cc:108] Vertex AI request: 0 /v2/health/live
I1130 19:14:34.753330 143 vertex_ai_server.cc:108] Vertex AI request: 0 /v2/health/live
I1130 19:14:44.753448 143 vertex_ai_server.cc:108] Vertex AI request: 0 /v2/health/live
2023-11-30 16:14:50.023 Signal (2) received.
I1130 19:14:49.013554 143 server.cc:305] Waiting for in-flight requests to complete.
I1130 19:14:49.013582 143 server.cc:321] Timeout 30: Found 0 model versions that have in-flight inferences
I1130 19:14:49.013727 143 server.cc:336] All models are stopped, unloading models
I1130 19:14:49.013742 143 server.cc:343] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I1130 19:14:49.013748 143 server.cc:350] StableDiffusion_Img2Img v1: UNLOADING
I1130 19:14:49.013830 143 backend_model_instance.cc:828] Stopping backend thread for StableDiffusion_Img2Img_0...
I1130 19:14:49.013914 143 python_be.cc:2248] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1130 19:14:50.013939 143 server.cc:343] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
I1130 19:14:50.013980 143 server.cc:350] StableDiffusion_Img2Img v1: UNLOADING
I1130 19:14:50.015469 143 model.py:218] Finalizing backend instance.
I1130 19:14:50.015632 143 model.py:219] Cleaning socket and context.
I1130 19:14:50.016192 143 model.py:228] Removing allocated shared memory.
2023-11-30 19:14:52,119 - DEBUG - pytriton.server.triton_server: Triton Inference Server stopped
2023-11-30 19:14:52,119 - DEBUG - pytriton.models.manager: Clean model ('stablediffusion_img2img', 1).
...
pytriton.client.exceptions.PyTritonClientTimeoutError: Waiting for server to be ready timed out.
It seems like its getting an interrupt signal. Before that the life checks /v2/health/live
seem to be returning 200, so don't know why this is happening.
@sricke would you be able to share full log in debug mode? This would help me to see the whole process from start to failure. Thanks!
@sricke I was able to reproduce the problem on some simple model. Will look more for the root cause. Thanks for patience.
@jkosek perfect thanks! Let me know if you find the cause.
As for now what I am seeing is an error while PyTriton query Triton Server to get model status. This might be caused by HTTP endpoint that seems to be missing while VertexAI support is enabled.
Could you modify the TritonConfig
as follow:
TritonConfig(exit_on_error=True, log_verbose=log_verbose, allow_http=True, allow_vertex_ai=True, vertex_ai_port=8015)
Please let me know it that helped. We will also work on long term fix related with models loading.
@sricke any update on running the solution with allow_http=True
flag passed in TritonConfig
? This solve the problem in the minimal example I've tested on VertexAI.
@jkosek sorry for the delayed response. I tried this specific configuration and now it works! Thanks a lot!
Perfect!
Will keep the issue opened until we fix the loading models in future releases.
Just to repeat the WAR:
pip install "nvidia-pytriton==0.3.1"
allow_http
to TritonConfig
along with allow_vertex_ai
:
TritonConfig(allow_http=True, allow_vertex_ai=True)
PyTriton 0.5.2 introduced support for VertexAI. See example for mode details.
Description
Hi! I'm trying to deploy a StableDiffusion model in GCP Vertex AI using Pytriton backend. My code works on a local machine, and I've been able to send requests and receive inference responses.
My problem arrives when I'm trying to create an endpoint using Vertex AI. Server run fails with error:
And then:
Don't know if the error with Vertex AI service is due to the server first crashing or vice-versa.
To reproduce
Attaching my server code
When creating Vertex endpoint, server predict route is configured to:
/v2/models/StableDiffusion_Img2Img/infer
And server health route is configured to:
/v2/health/live
With
Vertex port=8015
, same as HTTP port set in model configuration.Observed results and expected behavior
As stated, server runs on local machine, but fails initializing endpoint in Vertex AI. During Vertex build, local files are correctly downloaded and model pipeline is loaded, so error is probably in
triton.bind()
function. Attaching complete log output:Additional steps taken
From the timeout error raised by Pytriton we've tried increasing timeout by setting
monitoring_period_s
inserver.run()
to an arbitrary high threshold.We've also tied adapting server configuration to Vertex with:
But getting same error.
Environment
Docker base image:
nvcr.io/nvidia/pytorch:23.10-py3
Requierements:Any help is appreciated!!