Support for Multi-Model Endpoint with SageMaker

pranavpsv commented 4 years ago

I'm not sure if this is the right place to ask, but is there support for Multi-Model-Endpoint deployment with SageMaker for Torchserve? Using TorchServe, I tried creating a Multi-Model endpoint to deploy 2 Torch models placed in an S3 Bucket. I used sagemaker.multidatamodel.MultiDataModel to create this endpoint but deployment fails saying "The primary container for production variant AllTraffic did not pass the ping health check." When, I check CloudWatch logs, there's an error message: "ACCESS_LOG ... "GET /models HTTP/1.1" 404 0"

How do I deploy multiple-models to an endpoint on AWS SageMaker through TorchServe?

Note: Torchserve Single Model endpoint creation works well with SageMaker (when following this link)

maaquib commented 4 years ago

@pranavpsv I'll investigate this further. Is there a specific dockerfile you are using for the Sagemaker container?

pranavpsv commented 4 years ago

@pranavpsv I'll investigate this further. Is there a specific dockerfile you are using for the Sagemaker container?


FROM ubuntu:18.04

ENV PYTHONUNBUFFERED TRUE
LABEL com.amazonaws.sagemaker.capabilities.multi-models=true

RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
    fakeroot \
    ca-certificates \
    dpkg-dev \
    g++ \
    python3-dev \
    openjdk-11-jdk \
    curl \
    vim \
    && rm -rf /var/lib/apt/lists/* \
    && cd /tmp \
    && curl -O https://bootstrap.pypa.io/get-pip.py \
    && python3 get-pip.py

RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1

RUN pip install --no-cache-dir psutil \
                --no-cache-dir torch \
                --no-cache-dir torchvision

ADD serve serve
RUN pip install ../serve/

COPY dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh
RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh

RUN mkdir -p /opt/ml/model # This line was added to fix the first error on CloudWatch which said model-store not found: /opt/ml/model
RUN mkdir -p /home/model-server/ && mkdir -p /home/model-server/tmp
COPY config.properties /home/model-server/config.properties

WORKDIR /home/model-server
ENV TEMP=/home/model-server/tmp
ENTRYPOINT ["/usr/local/bin/dockerd-entrypoint.sh"]
CMD ["serve"]

Thank you, above is the contents of my dockerfile. I added a label at the top for multi-model support. Otherwise, it is the same dockerfile found on this link.

maaquib commented 4 years ago

config.properties

inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
number_of_netty_threads=32
job_queue_size=1000
model_store=/opt/ml/model

Dockerfile

Same as above

Notebook changes

from sagemaker.model import Model
from sagemaker.multidatamodel import MultiDataModel
from sagemaker.predictor import RealTimePredictor

model_data_prefix = f's3://{bucket_name}/{prefix}/'
model_data = f's3://{bucket_name}/{prefix}/models/{model_file_name}.tar.gz'
sm_model_name = 'torchserve-densenet161'

torchserve_model = Model(model_data = model_data, 
                         image = image,
                         role  = role,
                         predictor_cls=RealTimePredictor,
                         name  = sm_model_name)

multi_model = MultiDataModel(name              = sm_model_name,
                             model_data_prefix = model_data_prefix,
                             model             = torchserve_model)

multi_model.add_model(torchserve_model.model_data)

endpoint_name = 'torchserve-endpoint-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

predictor = multi_model.deploy(instance_type='ml.m4.xlarge',
                               initial_instance_count=1,
                               endpoint_name = endpoint_name)

Was able to spin up an endpoint with the above changes. However, the model doesn't get registered on server spin up. Running inference causes the following exception

ValidationError: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Endpoint torchserve-endpoint-2020-05-28-23-36-41 is not a multi-model endpoint and does not support target model header.

Will reach out to the Sagemaker team.

maaquib commented 4 years ago

@pranavpsv MultiModel BYOCs have an explicit dependency on MultiModelServer. Supporting TorchServe would require changes from the Sagemaker team

pranavpsv commented 4 years ago

@pranavpsv MultiModel BYOCs have an explicit dependency on MultiModelServer. Supporting TorchServe would require changes from the Sagemaker team

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/multi_model_bring_your_own/multi_model_endpoint_bring_your_own.ipynb

https://github.com/aws/sagemaker-inference-toolkit

Thank you for the information and update!

MFreidank commented 4 years ago

@maaquib Could you raise this with that team? Is there anything that can be done from our end to help expedite this process?

sivashakthi commented 4 years ago

We would like to know the status of this issue. Is this feature/fix in Sagemaker team's near-term roadmap?

maaquib commented 4 years ago

@sivashakthi TorchServe is now the default inference server for PyTorch models in SageMaker. It should work. Please let us know if you run into any issues.

pytorch / serve