Open zwei2016 opened 6 months ago
Hi @zwei2016,
You'll need to install any python dependencies necessary for your python model inside of the container before starting the server. For example, via pip install ...
.
You can prep a custom Docker so you can re-use it across runs as well:
FROM nvcr.io/nvidia/tritonserver:24.03-py3
RUN pip install ...
You can also look into packaging the dependencies along with your python model though custom environments: https://github.com/triton-inference-server/python_backend?tab=readme-ov-file#creating-custom-execution-environments
Thanks Ryan @rmccorm4
I customized the docker image nvcr.io/nvidia/tritonserver:24.03-py3
with installing the necessary libs and commit it as a new docker image. It works. Thank you.
By the way, when I try to use the server as described in the tutorial:
from merlin.systems.triton.utils import send_triton_request response = send_triton_request(workflow.input_schema, df, output_schema.column_names, endpoint="localhost:8001")
I got another error: Failed to open the cudaIpcHandle After searching around, I found that the cause might be CUDA shared memory is not supported on Windows. As I deployed the server in WSL2 within Win11, it would always have this error? is there any solution now?
Best Wei
Description A clear and concise description of what the bug is.
I am following the tutorial online: https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/examples/getting-started-session-based/03-serving-session-based-model-torch-backend.ipynb
After creating the model "executor_model", I tried to run the Triton Inference Server with
Triton Information What version of Triton are you using? tritonserver:24.03-py3
Are you using the Triton container or did you build it yourself? docker nvcr.io/nvidia/tritonserver:24.03-py3
To Reproduce Steps to reproduce the behavior.
I followed this online tutorial : https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/examples/getting-started-session-based/03-serving-session-based-model-torch-backend.ipynb
Expected behavior The server should reply to client with the following message: <HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '188'}> bytearray(b'[{"name":"0_transformworkflowtriton","version":"1","state":"READY"},{"name":"1_predictpytorchtriton","version":"1","state":"READY"},{"name":"executor_model","version":"1","state":"READY"}]')