triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.16k stars 1.46k forks source link

Triton does not refresh aws credentials when using IAM roles #2871

Closed nsiddharth closed 2 years ago

nsiddharth commented 3 years ago

Description Using IAM roles is a security best practice for aws deployments. When an IAM role is assumed an AWS_SESSION_TOKEN is generated which has a fixed expiry. (I had submitted a PR previously which included this token here) However, the credentials need to be periodically refreshed so that access to the model repository can be maintained to load newer models in explicit mode or detect changes in poll mode. Currently, triton is unable to access the model repository after the session token expires. It looks like Triton should be using the STS api which has a load_frequency param like shown here: or something similar which allows for refreshing credentials periodically. Without this ability, it is not possible to use triton in production aws deployments so hopefully the issue can be resolved soon. Triton Information What version of Triton are you using? 2.8.0 Are you using the Triton container or did you build it yourself? Using the container To Reproduce A simple call to unload and load models will fail after the expiry time of the session token (default 1 hour). Can use a script like so:

import tritonclient.grpc as grpcclient
from tritonclient.utils import InferenceServerException

MODEL_NAME = "simple"

URL = "localhost:8001"
triton_client = grpcclient.InferenceServerClient(url=URL, verbose=True)

triton_client.is_server_live()
print(triton_client.is_model_ready(MODEL_NAME))
triton_client.get_model_repository_index().models

"""
model loading and unloading 
This should pickup the latest available model
without having to call:
triton_client.unload_model(MODEL_NAME)
"""
triton_client.load_model(MODEL_NAME)
"""This should show that the latest model is available"""
triton_client.get_model_repository_index().models

Following are server side logs when the token expires:

I0513 17:31:57.765850 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000
I0513 17:31:57.814511 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002
I0513 17:53:06.044900 1 model_repository_manager.cc:820] unloading: deep_retrieval_model_index:6
I0513 17:53:06.054417 1 tensorflow.cc:2064] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0513 17:53:09.371870 1 tensorflow.cc:2003] TRITONBACKEND_ModelFinalize: delete model state
I0513 17:53:13.049231 1 model_repository_manager.cc:943] successfully unloaded 'deep_retrieval_model_index' version 6
I0513 17:53:30.712899 1 model_repository_manager.cc:787] loading: deep_retrieval_model_index:7
E0513 18:22:18.200294 1 model_repository_manager.cc:963] failed to load 'deep_retrieval_model_index' version 7: Internal: Could not get MetaData for bucket with name <model_repository>

As you can see from the timestamps, the model fails to load as soon as the token expires.

Expected behavior Triton should be able to continue having access to the model repository on s3 by using refreshed credentials. load_model and poll mode should not fail when using IAM roles with temporary credentials.

CoderHam commented 3 years ago

@nsiddharth Triton does not create a new S3 client for each request. The life of the S3 client begins when the server is launched with an S3 path and ends when the server is terminated. To support a use case where the credentials are refreshed would require a re-design the cloud filesystem logic. While we do understand you are request we would like to better understand how you would wish for this to be solved. Below I have listed two ways to do what you have asked:

  1. Use the 'current' AWS credentials create a new S3 client for each S3 request Triton makes i.e. each model load / poll / unload. (This would add a large overhead for most users and is undesired).
  2. Triton to create a new S3 client instance as soon as you update the S3 ENV variables. - How would you expect Triton know when the AWS credentials are updated? Periodically checking for updates to the ENV variables?

cc @deadeyegoodwin

nsiddharth commented 3 years ago

@CoderHam I am not entirely sure how to address this issue. There is not much I could find online especially with the c++ sdk. The boto3 client seems to have a way to do this here. Also, it seems from the documentation for the c++ sdk that there is a way to check for when to refresh(IsTimeToRefresh()) and reload(Reload()) the credentials here.

It also seems like every call to assume role refreshes the credentials. I was able to verify this using the cli like so: aws sts assume-role --role-arn $AWS_IAM_ROLE --role-session-name TestSession It is not clear to me what is the best way to address this in the most general manner but on our end we could have the default expiry set to a longer time(maximum is 12 hours I believe) and have triton use new credentials when it is about to expire or refresh at fixed intervals? Ideally, if the client can pickup the refreshed credentials when they change, that would be the best way to address this. I am not sure there is support for that in the sdk.

nsiddharth commented 3 years ago

Also looking at this thread, it seems like a credentials provider object needs to be passed instead of credentials. The credentialsProvider object has inbuilt refresh methods and so the client need not be re-instantiated? That is my understanding from what i've read online so far.

DXist commented 3 years ago

I've created similar issue - https://github.com/triton-inference-server/server/issues/2657

Currently I use a sidecar script that rotates temporary credentials every 10 minutes. It mostly works.

Unfortunately once a day after aws-iam-token expires there is a ~5minute window before next credentials are obtained by the script.

Triton fails to poll repository and unloads already loaded models making service unavailable during this window.

I run Triton with --strict-readiness=false, --exit-on-error=true and --model-control-mode=poll parameters. I don't use --exit-on-error=false because Triton doesn't retry model polling during startup initialisation if one of the models is misconfigured (but this is another issue and out of scope of this one)

For deployments in Kubernetes it's convenient to use this provider and configure IAM policy that allows to assume the role with access to S3 for Triton serviceaccount in K8S.

nsiddharth commented 3 years ago

@DXist Thanks for the advice! How does your sidecar script work? I thought you will need to make this change in triton so it can reload the creds at regular intervals. Once triton is up, how does your sidecar script ensure that triton will get the most recent credentials? I think the provider you have linked to can be used within triton. I can give it a shot if the triton devs can confirm that it is the right way to go.

DXist commented 3 years ago

@nsiddharth Triton and sidecar use the same emptyDir volume to access credentials

Sidecar configuration:

      containers:
        - name: aws-credentials-rotator
          image: amazon/aws-cli
          command: ["/bin/bash"]
          args:
            - "-c"
            - >-
              while true; do
              aws sts assume-role-with-web-identity --role-arn $AWS_ROLE_ARN --role-session-name session-`date +%s` --web-identity-token file://$AWS_WEB_IDENTITY_TOKEN_FILE  > /tmp/creds &&
              aws configure set region eu-central-1 &&
              aws configure set aws_access_key_id `grep -oP '(?<=AccessKeyId": ")[^"]+' /tmp/creds` &&
              aws configure set aws_secret_access_key `grep -oP '(?<=SecretAccessKey": ")[^"]+' /tmp/creds` &&
              aws configure set aws_session_token `grep -oP '(?<=SessionToken": ")[^"]+' /tmp/creds` &&
              sleep 600;
              done
          env:
            - name: AWS_SHARED_CREDENTIALS_FILE
              value: /aws/credentials
            - name: AWS_CONFIG_FILE
              value: /aws/config
          volumeMounts:
            - mountPath: /aws
              name: aws-credentials

As I've mentioned once a day web-identity-token gets expired making temporary credentials invalid till the next script run. Probably it's possible to watch token rotation using inotify and run the script on token file update.

But still Triton shouldn't unload models on s3 repository poll failure.

nsiddharth commented 3 years ago

@CoderHam I am trying to fix this on my end using the sts api and need a little help building the server. I have added the sts package to build/CMakeLists.txt like so:

ExternalProject_Add(aws-sdk-cpp
  PREFIX aws-sdk-cpp
  GIT_REPOSITORY "https://github.com/aws/aws-sdk-cpp.git"
  GIT_TAG "1.7.129"
  LIST_SEPARATOR "|"
  SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}/aws-sdk-cpp/src/aws-sdk-cpp"
  CMAKE_CACHE_ARGS
    ${_CMAKE_ARGS_CMAKE_TOOLCHAIN_FILE}
    ${_CMAKE_ARGS_VCPKG_TARGET_TRIPLET}
    -DBUILD_ONLY:STRING=s3|sts
    -DBUILD_SHARED_LIBS:BOOL=OFF
    -DMINIMIZE_SIZE:BOOL=ON
    -DENABLE_TESTING:BOOL=OFF
    -DCMAKE_POSITION_INDEPENDENT_CODE:BOOL=ON
    -DCMAKE_INSTALL_PREFIX:PATH=${CMAKE_CURRENT_BINARY_DIR}/aws-sdk-cpp/install
)

but I have not been able to get it to compile as I run into the following error:

/usr/bin/ld: libtritonserver.so: undefined reference to `Aws::STS::STSClient::~STSClient()'
/usr/bin/ld: libtritonserver.so: undefined reference to `vtable for Aws::STS::Model::AssumeRoleRequest'
/usr/bin/ld: libtritonserver.so: undefined reference to `Aws::STS::STSClient::AssumeRole(Aws::STS::Model::AssumeRoleRequest const&) const'
/usr/bin/ld: libtritonserver.so: undefined reference to `Aws::STS::Model::AssumeRoleRequest::AssumeRoleRequest()'
/usr/bin/ld: libtritonserver.so: undefined reference to `Aws::STS::STSClient::STSClient(Aws::Client::ClientConfiguration const&)'
collect2: error: ld returned 1 exit status

which suggests some linking issue. I tried adding sts to src/core/CMakeLists.txt like so:

if(${TRITON_ENABLE_S3})
  find_package(AWSSDK REQUIRED COMPONENTS s3 sts)
  message(STATUS "Using aws-sdk-cpp ${AWSSDK_VERSION}")
endif()

but that results in an error as well.

Can you help me include the sts package and point to the right places to make changes so it builds successfully?

Thanks.

nsiddharth commented 3 years ago

Here is the cmake error from find_package

CMake Error at /tmp/tritonbuild/tritonserver/build/aws-sdk-cpp/install/lib/cmake/AWSSDK/AWSSDKConfig.cmake:287 (find_package):
  By not providing "Findaws-cpp-sdk-sts.cmake" in CMAKE_MODULE_PATH this
  project has asked CMake to find a package configuration file provided by
  "aws-cpp-sdk-sts", but CMake did not find one.

  Could not find a package configuration file provided by "aws-cpp-sdk-sts"
  with any of the following names:

    aws-cpp-sdk-stsConfig.cmake
    aws-cpp-sdk-sts-config.cmake

  Add the installation prefix of "aws-cpp-sdk-sts" to CMAKE_PREFIX_PATH or
  set "aws-cpp-sdk-sts_DIR" to a directory containing one of the above files.
  If "aws-cpp-sdk-sts" provides a separate development package or SDK, be
  sure it has been installed.
Call Stack (most recent call first):
  /workspace/src/core/CMakeLists.txt:95 (find_package)
nsiddharth commented 3 years ago

Any pointers? would love to be able to fix this quickly.

deadeyegoodwin commented 3 years ago

We are not experts on building the AWS sdk. Is the following line the recommended way to build AWS with STS?

-DBUILD_ONLY:STRING=s3|sts

You may need to add an additional link dependencies here: https://github.com/triton-inference-server/server/blob/main/src/servers/CMakeLists.txt#L511

nsiddharth commented 3 years ago

yes, that part seems fine as it adds the sts package and i can see header files being included. Linking seems to be the proble,. I will try out the additional link dependencies like you have suggested. thanks!

dyastremsky commented 2 years ago

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue.

maafk commented 1 year ago

Here is my work around when running in the nvidia container toolkit

In Dockerfile, ensure the AWS cli is installed

FROM nvcr.io/nvidia/tritonserver:23.06-py3

RUN apt-get update && apt-get install -y --no-install-recommends \
    libgl1-mesa-glx cron && \
    rm -rf /var/lib/apt/lists/*  && \
    curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
    unzip awscliv2.zip && \
    ./aws/install

COPY set_aws_config.sh /set_aws_config.sh
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /set_aws_config.sh && \
    chmod +x /entrypoint.sh

ENTRYPOINT [ "/entrypoint.sh" ]

set_aws_config.sh grabs region info from EC2 Instance metadata.

If running as an ECS Task on EC2 ensure credentials_source is EcsContainer, otherwise use Ec2InstanceMetadata (ref)

#!/bin/bash

TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
ROLE_NAME=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/iam/security-credentials/)

AZ=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/placement/availability-zone)
REGION=$(echo $AZ | sed 's/[a-z]$//')

aws configure set default.region $REGION

echo "role_arn = $ROLE_NAME" >> ~/.aws/config
echo "credential_source = EcsContainer" >> ~/.aws/config  # use Ec2InstanceMetadata if running on EC2

entrypoint.sh Will ensure new containers have the aws config set up

#!/bin/sh

/set_aws_config.sh

exec "$@"

Since we're using EC2 instance metadata in set_aws_config.sh, we need to ensure we can access instance metadata within docker, requiring an extra hop

aws ec2 modify-instance-metadata-options \
  --instance-id i-0123456abcdefg \
  --http-put-response-hop-limit 2