triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.25k stars 1.47k forks source link

Could not load model using mlflow triton plugin with S3/minio as model repository #5964

Open pragadeeshraju opened 1 year ago

pragadeeshraju commented 1 year ago

Description Could not load model using mlflow with minIO as model repository. I have tried this AWS S3 bucket and it worked as expected. have followed this article MLflow Triton Plugin.

To Reproduce Steps to reproduce the behavior. set the ENV vars as expected using nvcr.io/nvidia/morpheus/mlflow-triton-plugin:2.2.2

export MLFLOW_TRACKING_URI=xxxxxx export TRITON_MODEL_REPO=xxxxx export AWS_ACCESS_KEY_ID=xxxxx export AWS_SECRET_ACCESS_KEY=xxx

mlflow deployments create -t triton --flavor triton --name damo-yolo-tiny -m models:/damo-yolo-tiny/1

Copied /tmp/tmpw3p8vzuq/damo-yolo-tiny to s3://mino:9000/model-registry/models/damo-yolo-tiny
Saved mlflow-meta.json to s3://mino:9000/model-registry/models/damo-yolo-tiny
Traceback (most recent call last):
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/mlflow_triton/deployments.py", line 109, in create_deployment
    self.triton_client.load_model(name)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/tritonclient/http/__init__.py", line 691, in load_model
    _raise_if_error(response)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/tritonclient/http/__init__.py", line 65, in _raise_if_error
    raise error
tritonclient.utils.InferenceServerException: failed to load 'damo-yolo-tiny', failed to poll from model repository

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/mlflow/bin/mlflow", line 8, in <module>
    sys.exit(cli())
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/mlflow/deployments/cli.py", line 142, in create_deployment
    deployment = client.create_deployment(name, model_uri, flavor, config=config_dict)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/mlflow_triton/deployments.py", line 111, in create_deployment
    raise MlflowException(str(ex))
mlflow.exceptions.MlflowException: failed to load 'damo-yolo-tiny', failed to poll from model repository

But the same works well with AWS S3...

not sure what am I missing here...

Reference for S3 as model repo

kthui commented 1 year ago

Hi @pragadeeshraju, would you be able to try placing a simple model onto the MinIO server, and loading it directly with Triton? This should help us check if there is any configuration issue, between Triton and MinIO.

Here are some example models: https://github.com/triton-inference-server/server/tree/main/docs/examples/model_repository A quick start to launch Triton: https://github.com/triton-inference-server/server/blob/main/docs/getting_started/quickstart.md

pragadeeshraju commented 1 year ago

Hi @kthui, loading directly from MinIO works fine. No issues with that.. But trying to load the model with mlflow triton plugin is not.. Im doing it in EKS cluster..

I have another question related to loading model to all the triton server.. Using TRITON_URL=export TRITON_URL=tris-triton-inference-server.triton.svc.cluster.local:8000 loads the model to only one pod underlying the service... Is there any way to load the models to all the triton pods.. Basically would like to know how this would help in scalability..

kthui commented 1 year ago

export MLFLOW_TRACKING_URI=xxxxxx export TRITON_MODEL_REPO=xxxxx export AWS_ACCESS_KEY_ID=xxxxx export AWS_SECRET_ACCESS_KEY=xxx

Could you check if the AWS_DEFAULT_REGION environment variable is set when you are using the MinIO server? We have seen issues in the past where not setting this variable could cause issues connecting to a local server. It could be set to any values like export AWS_DEFAULT_REGION=us-west-2.

If this is not working, can you start the server with --log-verbose=2 flag set, and share the server log when the issue occurred? The log could be helpful at triaging the issue.

pragadeeshraju commented 1 year ago

I have tried adding AWS_DEFAULT_REGION and checked.. but ended up with same results...

let me know if you need more information..

Im running both the triton server and the mlflow plugin in EKS... So do I need to enable any particular port or any config changes from EKS side?

Mlflow plugin logs

(mlflow) root@mlflow-plugin-5c7dfdcbb8-vsr5p:/mlflow# mlflow deployments create -t triton --flavor triton --name hoxtonhead-mlflow -m models:/hoxtonhead-mlflow/1
Copied /tmp/tmp1bzd06zm/hoxtonhead-mlflow to s3://minio-service.kubeflow.svc.cluster.local:9000/model-registry/models/hoxtonhead-mlflow
Saved mlflow-meta.json to s3://minio-service.kubeflow.svc.cluster.local:9000/model-registry/models/hoxtonhead-mlflow
Traceback (most recent call last):
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/mlflow_triton/deployments.py", line 109, in create_deployment
    self.triton_client.load_model(name)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/tritonclient/http/__init__.py", line 691, in load_model
    _raise_if_error(response)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/tritonclient/http/__init__.py", line 65, in _raise_if_error
    raise error
tritonclient.utils.InferenceServerException: failed to load 'hoxtonhead-mlflow', failed to poll from model repository

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/mlflow/bin/mlflow", line 8, in <module>
    sys.exit(cli())
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/mlflow/deployments/cli.py", line 142, in create_deployment
    deployment = client.create_deployment(name, model_uri, flavor, config=config_dict)
  File "/opt/conda/envs/mlflow/lib/python3.10/site-packages/mlflow_triton/deployments.py", line 111, in create_deployment
    raise MlflowException(str(ex))
mlflow.exceptions.MlflowException: failed to load 'hoxtonhead-mlflow', failed to poll from model repository

Triton server log

I0622 10:31:05.991049 1 model_lifecycle.cc:264] ModelStates()
I0622 10:31:08.042284 1 http_server.cc:3216] HTTP request: 2 /v2/repository/models/hoxtonhead-mlflow/load
I0622 10:31:08.042341 1 filesystem.cc:2335] Using credential    for path  s3://minio-service.kubeflow.svc.cluster.local:9000/model-registry/models/hoxtonhead-mlflow
I0622 10:31:10.991168 1 http_server.cc:3216] HTTP request: 0 /v2/health/live
I0622 10:31:10.991168 1 http_server.cc:3216] HTTP request: 0 /v2/health/ready
I0622 10:31:10.991232 1 model_lifecycle.cc:264] ModelStates()
pragadeeshraju commented 1 year ago

After lot of testing, it seems to be like mlflow deployments were not able to load models to triton using the nvidia mlflow-triton-plugin:2.2.2 docker image... i can only load models which is already existing in the s3 and when i tried to load new models which is existing in the s3 fails...

This can be reproduced by using the docker image with s3/minio as model repository...

kthui commented 1 year ago

Thanks for providing more details. I have created a ticket for us to investigate further. DLIS-5063

ktsaliagkos commented 9 months ago

@kthui Is this resolved? The same thing happens with GCS. Separately, both the Triton and the MLflow servers work great, Triton is able to load models from its gs://... model repository and MLflow uploads models to its gs://... artifact destination, no problem at all. However, exec-ing into the plugin and running the mlflow deployments create command gives the exact same exception: mlflow.exceptions.MlflowException: failed to load <model_name>, failed to poll from model repository. Is this use case supported? The docs say nothing..

cile98 commented 6 months ago

@kthui was just trying to deploy the model from mlflow server to triton that's deployed on k8s cluster via minio and got the same error message. Do you have any ideas why is this happening?

cile98 commented 6 months ago

@pragadeeshraju @ktsaliagkos I have figured it out and now it works for me with Minio storage. Multiple issues could trigger this exception. First, it can be caused by the wrong Minio storage URI, the script expects it to contain a port number, what ended up working for me was something like s3://https://miniohost.com:8000/bucketname. Another thing with deploying models from mlflow is to ensure that they have the right folder structure (check the example here: https://github.com/triton-inference-server/tutorials/tree/main/Quick_Deploy/PyTorch) and to include config.pbtxt file. After I've fixed those two things everything worked out.

AsoTora commented 3 months ago

Thanks @cile98, I was having similar issues while setting this up with a bitnami chart for mlflow and this thread gave me the right ideas to look into.

Specifically, one of the problems was solved by adding the port to the TRITON_URL=triton-inference-server.triton.svc.cluster.local variable.